Closed lmanchon closed 3 years ago
Hi @lmanchon ,
Not an author of CNVkit, but AFAIK, it is not possible to disable them completely => Maybe try to reduce the number of called CNV, as explained here in CNVkit's documentation ?
Hope this helps. Have a nice day. Felix.
Hi @lmanchon!
The suggestion by @tetedange13 indeed makes a lot of sense. If you're having so many gene labels that they're interfering with the diagram, changes are that something went wrong in the process of calling the CNVs. Would you mind sharing the diagram so that we could see what could be the reason?
However, if you're certain that the calling parameters are correct and there is a valid reason for there to be so many CNVs (e.g. it's a highly mixed up tumour genome), I suppose we could add a flag to disable gene names, it shouldn't be too hard.
In the meanwhile, as a workaround, you can simply edit CNR/CNS file and to fill the gene
column with dashes (-
)
see below the plot generated with those commands: cnvkit.py batch SRR6823378_recalibrated.bam -n -m amplicon --segment-method hmm-germline -f Homo_sapiens_hg38.fasta --annotate refFlat.txt --short-names -t exome.bed -d OUTPUT_CNV -p 0 --drop-low-coverage --target-avg-size 300 cnvkit.py scatter -s OUTPUT_CNV/SRR6823378_recalibrated.cn{s,r} --y-max 5 --y-min -5 --title SRR6823378 -o OUTPUT_CNV/SRR6823378_CNV_scatter-log2.pdf cnvkit.py diagram -s OUTPUT_CNV/SRR6823378_recalibrated.cn{s,r} -t 1 -o OUTPUT_CNV/SRR6823378_CNV_diagram.pdf
Hi @lmanchon ,
You have a WES germline dataset, so no particular reason for you to have many CNV I guess? (I mean regarding @tskir remark) => However are you really on an "amplicon" method? If yes, have you try to stop marking duplicate reads, as you were suggested in #628 ?
You also are setting a lot of custom values for parameters. Any particular reason for that? Have you tested different things?
=> For example Germline Analysis part of CNVkit documentation says using --drop-low-coverage
can be risky in your case
=> And "hmm*" segmenting methods usually produce more small segments (to detect more focal CNV), so maybe it comes from there" ?
Hope this helps. Have nice day. Felix.
--Hello @tetedange13,
as you suggested i'm going to try using another parameters, also i use this bed file: https://www.twistbioscience.com/sites/default/files/resources/2019-06/Twist_Exome_Target_hg38.bed
what do you mean by "amplicon" method ? is WES analysis an amplicon method or not ?
thank you so much--
To sum up, except for WGS any NGS method requires an enrichment step, to get regions of interest (eg: panel of genes, whole exome etc) => To do so several methods exist, with their own properties, such as:
CNVkit can handle all 3 methods but was primarily designed for hybrid-capture data (see this part of CNVkit's documentation for detailed behaviour depending on the method) => Also I see you are working on this public dataset right? In metadata I saw it is obtained with "Agilent SureSelect XT Human All Exon Kit V5" and not Twist Exome... => So you should rather use one of these BED files, supposing your BAM were made against hg38 ?
Anyway your biggest problem is setting -m amplicon
when you have sequencing data made with hybrid-capture method (both Twist's and Agilent's exomes are)
=> Simply remove this (default method is -m hybrid(-capture)
), stick with default parameters to start and see if this is better
Best, Felix.
--Hi,
i have tried again with the commands below (with or without mark duplicates), i obtain the attached results(see pictures).
cnvkit.py batch SRR6823378.bam -n -m hybrid -f Homo_sapiens_hg38.fasta -t Exome-Agilent_V5.bed -d OUTPUT_CNV -p 0 cnvkit.py scatter -s OUTPUT_CNV/SRR6823378.cn{s,r} --y-max 5 --y-min -5 --title SRR6823378 -o OUTPUT_CNV/SRR6823378_CNV_scatter-log2.pdf cnvkit.py diagram -s OUTPUT_CNV/SRR6823378.cn{s,r} -o OUTPUT_CNV/SRR6823378_CNV_diagram.pdf
i have less points on the scatter but too much labels on diagram.
Hi @lmanchon ,
Looks way better to me, compared to your previously shared scatter plot ! => But someone else might confirm ?
Regarding your diagram
:
-t/--threshold
parameter to 1
anymore, even though you used to (and default value for this param is 0.5) --> hard to compareAnyway if you are still having a lot of CNV, a quick workaround could be to simply change to diagram -s OUTPUT_CNV/SRR6823378.call.cns
(this one has .call. in its name)
=> Complete new command: cnvkit.py diagram -s OUTPUT_CNV/SRR6823378.call.cns -t 1 -o OUTPUT_CNV/SRR6823378_CNV_diagram.pdf OUTPUT_CNV/SRR6823378.cnr
=> This ".call.cns" file should be present in your output directory if you are running latest CNVkit version (which I advice you to) and has been post-processed to remove some likely false-positive CNV (see here for details)
=> And as said before, you should find details in CNVkit's documentation about this topic
Also a flag to disable gene_name labels should be available anytime soon
Kind regards. Felix.
cnvkit.py version 0.9.9
ok thank you, i'm investigating some change. btw another parameter would be usefull: possibility to plot diagram for only one chromosome (as scatter option) In attachment the result file from SRR6823378 analysis. SRR6823378.zip
Reopening the issue until #634 is merged
--Hi all,
is there an option to plot diagram without gene labels ? i have too much gene name on my chromo.
thank you --