etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
520 stars 163 forks source link

Gene name missing from annotations in calls file #688

Open micknudsen opened 2 years ago

micknudsen commented 2 years ago

Hi,

I have come across some CNVkit output, where a gene name appears to be missing from the final calls file. Here are the two lines containing the EGFR gene:

$ grep EGFR cnvkit.called.tsv
chr7    55032092    55155774    EGFR    1.98424 8   741.456 22  20.465
chr7    55155774    55365525    EGFR,EGFR,EGFR-AS1  4.66905 51  6431.42 28  26.5522

However, when I inspect the previous line in cnvkit.called.tsv,

chr7    54246732    55031592    VSTM2A,VSTM2A,VSTM2A-OT1,VSTM2A-OT1,VSTM2A,SEC61G   4.91824 61  4146.14 19  17.8334

it does not contain EGFR, even though the region overlaps the first exon of the gene. I assume that gene annotation comes from reference.cnn, and when I inspect this,

$ grep EGFR reference.cnn  | head -n 5
chr7    55018770    55019096    EGFR    -0.74602    94.2884 0.757669        0.289361
chr7    55019096    55019423    EGFR    -0.715226   97.2761 0.755352        0.388429
chr7    55032092    55032193    EGFR    -0.177515   79.4214 0.405941        0.218047
chr7    55088166    55088469    EGFR    0.229904    167.095 0.518152        0.112218
chr7    55088469    55088772    EGFR    0.134641    149.743 0.435644        0.155047

the first two intervals are fully contained within the call. Shouldn't the EGFR name then be carried over to the cnvkit.called.tsv file?

Thanks!

etal commented 2 years ago

Hmm, could be a bug. Thanks for reporting!

DanielAmsel commented 2 years ago

Hi, I think I have a similar issue: When I try to plot some specific genes, the script can not find them. The genes are also not in den *.cn{s,r} files. Nevertheless, the gene is mentioned in the refFlat.txt file that I downloaded. When run cnvkit batch with a custom .bed file, the genes are found.

@micknudsen : Did you also use the refFlat.txt or a custom file for --annotations?

Best, Daniel

micknudsen commented 2 years ago

@micknudsen : Did you also use the refFlat.txt or a custom file for --annotations?

@DanielAmsel Neither of these. I use a target BED file with gene names added as a fourth column. They are then magically carried over to the final calls file. It has been a long time since I set up my workflow, but I vaguely remember having issues with creating a suitable refFlat.txt file.