Open hsienchao opened 9 months ago
Tried to use Manoj's Sequenza gene level script to process the old cases. I found this script generates inconsistent CN and A/B values. For example:
So I made a script which will use the segment that has highest overlapped regions with the gene. I've re-processed all old cases on Frederick server.
#chromosome | start.pos | end.pos | Gene | CNt | A | B |
---|---|---|---|---|---|---|
chr1 | 13424002 | 13648656 | PRAMEF15 | 2 | 1 | 0 |
chr1 | 13448049 | 13671691 | PRAMEF13 | 2 | 1 | 0 |
chr1 | 13448049 | 13671691 | PRAMEF14 | 2 | 1 | 0 |
chr1 | 13474688 | 13698358 | PRAMEF19 | 2 | 1 | 0 |
chr1 | 13495283 | 13718962 | PRAMEF17 | 2 | 1 | 0 |
chr1 | 13521972 | 13747733 | PRAMEF20 | 2 | 1 | 0 |
chr11 | 48346492 | 48347482 | OR4C3 | 2 | 2 | 1 |
chr14 | 106518399 | 106518853 | IGHV3-7 | 2 | 1 | 0 |
chr18 | 3499182 | 3880068 | DLGAP1 | 2 | 2 | 1 |
chr5 | 128796102 | 129072911 | ADAMTS19 | 2 | 2 | 1 |
The CNVkit and Sequenza data flow is being improved. The old way is to upload the CNV segment data and then join the gene table. This is very slow. These two tables need around 30 mins to refresh the joined tables. Now I plan to improve this by creating the join table first by bedtools' intersect and groupy commands. Then we upload the segment file with gene list.