Open wqrao opened 2 years ago
Hi @wqrao ,
In the user guide it is said that to build a reference for CNVkit (your "flatRef.cnn"), you are supposed to use "baited genomic regions for your target capture kit, as provided by your vendor" => Which are not supposed to contain duplicated regions (the reason why you get an error)
Hope this helps ! Kind regards, Felix.
Hello @tetedange13 ,
Many thanks for your help. I regenerated the "flatRef.cnn" and built .cnr file successfully. However, the next step failed again.
I wrote cnvkit.py segment ./res/B1_bqsr.cnr -o ./res/B1_bqsr.cns -m hmm-germline -p 8
for generating .cns
There are errors as follows:
Segmenting with method 'hmm-germline' in 8 processes Smoothing overshot at 6 / 15715 indices: (-27.561884666418475, 2.423114113233022) vs. original (-26.4023, 5.41255) Smoothing overshot at 11 / 15661 indices: (-32.590967831385235, 1.1132613787142873) vs. original (-26.4023, 4.01252) Smoothing overshot at 4 / 109 indices: (-8.495912514212478, 5.0282835362849) vs. original (-26.1403, 3.58033) Building model from observations Traceback (most recent call last): File "/home/yaoxq/miniconda3/bin/cnvkit.py", line 9, in <module> args.func(args) File "/home/yaoxq/miniconda3/lib/python3.7/site-packages/cnvlib/commands.py", line 660, in _cmd_segment smooth_cbs=args.smooth_cbs) File "/home/yaoxq/miniconda3/lib/python3.7/site-packages/cnvlib/segmentation/__init__.py", line 54, in do_segmentation rscript_path) File "/home/yaoxq/miniconda3/lib/python3.7/site-packages/cnvlib/segmentation/__init__.py", line 139, in _do_segmentation segarr = hmm.segment_hmm(filtered_cn, method, threshold, variants) File "/home/yaoxq/miniconda3/lib/python3.7/site-packages/cnvlib/segmentation/hmm.py", line 45, in segment_hmm model = hmm_get_model(cnarr, method, processes) File "/home/yaoxq/miniconda3/lib/python3.7/site-packages/cnvlib/segmentation/hmm.py", line 139, in hmm_get_model verbose=False) File "pomegranate/hmm.pyx", line 2564, in pomegranate.hmm.HiddenMarkovModel.fit File "pomegranate/hmm.pyx", line 2565, in pomegranate.hmm.HiddenMarkovModel.fit TypeError: delayed() got an unexpected keyword argument 'check_pickle'
Could you help me once more? Thanks very much!
This error looks more like an issue related to required Python packages for hmm-germline
specific segment-method
=> How did you install CNVkit exactly ? I see you are running it inside a conda environnement
=> But did you install it inside its own conda env ?
=> If not, you may have a "Python package collision" = one of the package required were already installed in your base
env, but not in the correct version for CNVkit
Try following command: conda create -n cnvkit -c conda-forge -c bioconda cnvkit
=> This should create you a dedicated conda-env named cnvkit
, with everything properly installed inside
=> You can also do it using set up conda channels as explained in CNVkit's README
Also please note, as explained in the documentation, that hmm-germline
method is experimental
=> Default segment-method (called CBS) performs well in most cases
=> Plus you may not know that there is a dedicated section explaining how you can proceed with germline samples
Hope you succeed ! Best wishes, Felix.
Thanks a lot! I have run the process successfully. I export VCF format for call
section. However, I am confused about the value of FOLD_CHANGE
and FOLD_CHANGE_LOG
. Are they the same as the log2 ratio? Could you explain something about it? I appreciate your help.
#CHROM | POS | ID | REF | ALT | QUAL | FILTER | INFO | FORMAT | SampleID
chr1 | 12080 | . | N | <DEL> | . | . | IMPRECISE;SVTYPE=DEL;END=297507;SVLEN=-285427;FOLD_CHANGE=0.787774;FOLD_CHANGE_LOG=-0.344146;PROBES=40 | GT:GQ | 0/1:40
I think these field names were thought self-explicit enough, but adding their formula in exported VCF header might help ?
=> Looking into export vcf
code, we see following formulas:
https://github.com/etal/cnvkit/blob/e29ec7de7bdd03892820a7a712c284c46b580dda/cnvlib/export.py#L327-L328
Meaning that:
FOLD_CHANGE_LOG
is completely the same as "log2 ratio" (which can be seen as a "log fold change" as it is the log2-transformation of ratio (fold change) tumor_coverage / reference_coverage
)FOLD_CHANGE
= 2 ** (power of or ^) log2_ratio
= You revert log2-transformation, so you just have a fold change (ratio) in copy number left (relative to your reference ploidy so multiply this value by "2" for Human and you get an "absolute copy number")Found another explaination inside call
code :
https://github.com/etal/cnvkit/blob/e29ec7de7bdd03892820a7a712c284c46b580dda/cnvlib/call.py#L269-L271
Hope this helped ! Kind regards, Felix.
Hello, I was doing germline analysis following the user guide using cnvkit 0.9.9. However, I have some questions about outputting cnr. I use the code
cnvkit.py batch B1_bqsr.bam --segment-method hmm-germline -p 8 --output-reference result/ -r FlatRef.cnn
Thencnvkit.py fix B1_bqsr.targetcoverage.cnn B1_bqsr.antitargetcoverage.cnn FlatRef.cnn -o B1.cnr
. In this step, there are errors as follows:How to deal with it? Thank you for your help.