CCRGeneticsBranch / Oncogenomics_v2

Oncogenomics portal version 2
0 stars 0 forks source link

Remove CNVkit and Sequenza gene tables #50

Open hsienchao opened 7 months ago

hsienchao commented 7 months ago

The CNVkit and Sequenza data flow is being improved. The old way is to upload the CNV segment data and then join the gene table. This is very slow. These two tables need around 30 mins to refresh the joined tables. Now I plan to improve this by creating the join table first by bedtools' intersect and groupy commands. Then we upload the segment file with gene list.

hsienchao commented 7 months ago

Tried to use Manoj's Sequenza gene level script to process the old cases. I found this script generates inconsistent CN and A/B values. For example:

So I made a script which will use the segment that has highest overlapped regions with the gene. I've re-processed all old cases on Frederick server.

#chromosome     start.pos       end.pos Gene    CNt     A       B
chr1    13424002        13648656        PRAMEF15        2       1       0
chr1    13448049        13671691        PRAMEF13        2       1       0
chr1    13448049        13671691        PRAMEF14        2       1       0
chr1    13474688        13698358        PRAMEF19        2       1       0
chr1    13495283        13718962        PRAMEF17        2       1       0
chr1    13521972        13747733        PRAMEF20        2       1       0
chr11   48346492        48347482        OR4C3   2       2       1
chr14   106518399       106518853       IGHV3-7 2       1       0
chr18   3499182 3880068 DLGAP1  2       2       1
chr5    128796102       129072911       ADAMTS19        2       2       1