Open gtollefson opened 3 years ago
Hi @gtollefson,
Not an author of CNVkit, but thanks for reporting this ! I could "reproduce" a bit this on my own hybrid-capture panel data, using your batch
command
--annotate
) look identical to me (excluding "gene" column of course), which is quite comforting
=> If possible for you, could you please confirm this on your own data? At least for ".cnr" and ".cns" files (something like: diff -qs <(cut -f4 --complement WITH_annot.cnr) <(cut -f4 --complement WITHOUT_annot.cnr)
)scatter -s Sample.cn{s,r}
plots produced by both pipelines also look pretty identical==> So IMHO this is purely a diagram
graphical artifact
cnvkit.py diagram WITH_annot.cnr
(and compared with: cnvkit.py diagram WITHOUT_annot.cnr
)I think this is due to several things:
params.IGNORE_GENE_NAMES
To sum up, bin-representations of diagram
are different depending on --annotate
because:
--annotate
we are visualising "squashed" cnarr, thanks to present gene_namesI would not tell it is "unexpected", but not sure
=> Because diagram
will squash by gene names almost all the time (controlled by cnarr_is_seg
bool)
=> Plus I guess looking at data without a single gene annotation is not very common ?
However I cannot explain your important change of colors, with "annotated" plot being more "dark red-ish
=> Especially as this looks not correlated to unannotated / unsquashed regions
=> Could come from something in squashing process? Maybe it is expected too ?
Hope this helps. Have a nice day. Felix.
When I run the batch command on one tumor/normal low-pass whole genome sequencing sample pair without the
--annotate
option, I receive very different diagram plot output from that produced by running the batch command on the same sample using the--annotate
option and the refFlat.txt file which corresponds with my reference genome version. I've pasted the output of the two batch command runs below.Output produced with default batch commands without the
--annotate
option:Output produced with the same commands as above but with the
annotate
option provided with the refFlat.txt file appropriate to the reference genome:My full batch command is:
cnvkit.py batch DNA-T2.sorted.bam --normal DNA-N2.sorted.bam \ --fasta GRCh38_full_analysis_set_plus_decoy_hla.fa \ --output-reference GRCh38_full_analysis_set_plus_decoy_hla.cnn --output-dir results/ \ --diagram --scatter --method wgs
with and without
--annotate refFlat.txt
Is this expected behavior? Can you explain why the two output plots are different (aside from the gene labels)?
Thank you, George