abyzovlab / CNVpytor

a python extension of CNVnator -- a tool for CNV analysis from depth-of-coverage by mapped reads
MIT License
180 stars 27 forks source link

Issues with non-human reference genome #18

Closed philmcnamara closed 3 years ago

philmcnamara commented 4 years ago

Hi all,

I am trying to run the tool with an in-house reference genome and I'm running into some issues. I have followed the instructions here to create a gc file and a configuration file for my reference genome.

When I run

cnvpytor -conf conf.py -root rootfile.pytor -rd alignments.bam

all the steps look identical to when I run the tool on some human data I have, except it skips the "RD parity distribution gaussian fit" step and when I then try to generate depth histograms with

cnvpytor -conf conf.py -root rootfile.pytor -his 10000 100000

it just prints "Calculating global statistics" without actually calculating anything. The call file after partitioning and calling is empty, and

cnvpytor -rootfile.root -ls

shows no depth histograms are actually present.

Any assistance would be much appreciated.

suvakov commented 4 years ago

Hi,

Can you paste here content of conf.py file?

Thanks

philmcnamara commented 4 years ago

This is my configuration file for my test case with 1 chromosome. Our assembly has ~5800 scaffolds so I have extra code to generate the dictionary for the full case. I get the same results with both.

import_reference_genomes = { "chr1": { "name" : "chr1", "species" : "A tesselatus", "chromosomes" : OrderedDict([("Chromosome_1_marmoratus", (288666491, "A"))]), "gc_file" : "/fsimb/groups/imb-baumanngr/pm/teti/lims_1364/cnvpytor/chr1_test/chr1_gc_file.pytor", } }

Thank you!

suvakov commented 4 years ago

Probably we have a bug but we are unable to determine exactly what causes it without more informations. Can you send me log files (with -v debug option) on email (suvakov at gmail)? Thank you!

suvakov commented 4 years ago

Please try again to run -rd and -his steps after cloning GitHub repository. Your GC file is fine. Bug was related to underscore "_" symbol in chromosome name. Thank you for reporting this issue.

philmcnamara commented 4 years ago

Great! I am on vacation this week but I will confirm next week when I am back in the office. Thank you again for the help!

Phil

On Tue, Aug 18, 2020, 04:24 Milovan Šuvakov notifications@github.com wrote:

Please try again to run -rd and -his steps after cloning GitHub repository. Your GC file is fine. Bug was related to underscore "_" symbol in chromosome name. Thank you for reporting this issue.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/abyzovlab/CNVpytor/issues/18#issuecomment-675211566, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGLQQXV7P6WWCGQY4ST2BPTSBHQ7TANCNFSM4P4JPO2A .

philmcnamara commented 4 years ago

his, partition, and call are now working for me and producing output!

suvakov commented 4 years ago

Great. Thanks you

LorenaDerezanin commented 4 years ago

Hi there, I believe I ran into a similar issue with my data using non-model reference genome. I successfully created a gc file and configured the reference genome, but unfortunately, further steps failed. The histogram and partition steps don't get finished (stats not being calculated actually, pytor file doesn't change in size/content, no rd hists are present) and output of calls.tsv ends up empty. I first tried running cnvpytor available in conda repo, and afterward git cloned cnvpytor from your site as suggested in this issue, installed with pip in a new env, just in case if some of the bug fixes might not be present in the conda version.

Here's what I ran:

cnvpytor -root df_HiC_ref_GC_file.pytor -gc $REF/MusPutFur1.0_HiC.fa.gz -make_gc_file

cnvpytor -conf df_ref_genome_conf.py -root $OUT/bff_ref_output/bff_ref.pytor -chrom HiC_scaffold_1 HiC_scaffold_2 HiC_scaffold_3 HiC_scaffold_4 HiC_scaffold_5 \
HiC_scaffold_6 HiC_scaffold_7 HiC_scaffold_8 HiC_scaffold_9 HiC_scaffold_10 HiC_scaffold_11 HiC_scaffold_12 HiC_scaffold_13 HiC_scaffold_14 HiC_scaffold_15 \
HiC_scaffold_16 HiC_scaffold_17 HiC_scaffold_18 HiC_scaffold_19 HiC_scaffold_20 \
-rd $BAM/BFFL00RG.merged_df_rmdup.bam

cnvpytor -conf df_ref_genome_conf.py -root $OUT/bff_ref_output/bff_ref.pytor -his 1000 10000 100000 -v debug -log hist.log
cnvpytor -conf df_ref_genome_conf.py -root $OUT/bff_ref_output/bff_ref.pytor -partition 1000 10000 100000 -v debug -log partition.log
cnvpytor -conf df_ref_genome_conf.py -root $OUT/bff_ref_output/bff_ref.pytor -call 1000 10000 100000 -v debug -log call.log > calls.tsv

Here's the output of cnvpytor -conf df_ref_genome_conf.py -root $OUT/bff_ref_output/bff_ref.pytor -ls

File created: 2020-09-22 13:48 using CNVpytor ver 1.0

Chromosomes with RD signal: HiC_scaffold_1, HiC_scaffold_2, HiC_scaffold_3, HiC_scaffold_4, HiC_scaffold_5, HiC_scaffold_6, HiC_scaffold_7, HiC_scaffold_8, HiC_scaffold_9, HiC_scaffold_10, HiC_scaffold_11, HiC_scaffold_12, HiC_scaffold_13, HiC_scaffold_14, HiC_scaffold_15, HiC_scaffold_16, HiC_scaffold_17, HiC_scaffold_18, HiC_scaffold_19, HiC_scaffold_20

Chromosomes with SNP signal: 

Using reference genome: dom_fer_HiC [ GC: yes, mask: no ]

Chromosomes with RD histograms [bin sizes]:  []

Chromosomes with SNP histograms [bin sizes]:  []

Chromosome lengths: {'HiC_scaffold_6': '148939764', 'HiC_scaffold_17': '60143717', 'HiC_scaffold_15': '82103545', 'HiC_scaffold_13': '91916443', 'HiC_scaffold_4': '164921773', 'HiC_scaffold_14': '90773495', 'HiC_scaffold_7': '145371413', 'HiC_scaffold_16': '67667742', 'HiC_scaffold_5': '157143054', 'HiC_scaffold_2': '198131181', 'HiC_scaffold_11': '113201178', 'HiC_scaffold_12': '108086088', 'HiC_scaffold_1': '219704201', 'HiC_scaffold_20': '38935597', 'HiC_scaffold_10': '123993920', 'HiC_scaffold_3': '191042103', 'HiC_scaffold_18': '59959324', 'HiC_scaffold_19': '39934883', 'HiC_scaffold_8': '141449838', 'HiC_scaffold_9': '139300057'}

Potentially important info: scaffold 10 is a sex chromosome X, I've successfully ran Manta and Whamg SV callers on the same bam file, and would like to intersect those call sets with cnvpytor output.

What do you think might be the issue in my case? Am I missing something crucial in the code?

Here's the tarball with debug logs, conf.py and pytor files logs_conf_files.tar.gz

Thank you in advance for your assistance, Lorena

suvakov commented 4 years ago

GC file looks fine. Can you send the log file created by step:

cnvpytor -conf df_ref_genome_conf.py -root $OUT/bff_ref_output/bff_ref.pytor -chrom HiC_scaffold_1 HiC_scaffold_2 HiC_scaffold_3 HiC_scaffold_4 HiC_scaffold_5 \
HiC_scaffold_6 HiC_scaffold_7 HiC_scaffold_8 HiC_scaffold_9 HiC_scaffold_10 HiC_scaffold_11 HiC_scaffold_12 HiC_scaffold_13 HiC_scaffold_14 HiC_scaffold_15 \
HiC_scaffold_16 HiC_scaffold_17 HiC_scaffold_18 HiC_scaffold_19 HiC_scaffold_20 \
-rd $BAM/BFFL00RG.merged_df_rmdup.bam

Thanks

LorenaDerezanin commented 3 years ago

Thank you for a swift reply, here's the log file from rd step: read_depth_signal.log

suvakov commented 3 years ago

Sorry for delay. The reason was probably bug (just fixed) related to auto detection of reference genome. Please try to clone new version from GitHub and rerun all steps.

Other way (without rerunning "-rd" step) is to manually specify reference genome:

cnvpytor -conf df_ref_genome_conf.py -root $OUT/bff_ref_output/bff_ref.pytor -rg dom_fer_HiC

run "-stat" step:

cnvpytor -conf df_ref_genome_conf.py -root $OUT/bff_ref_output/bff_ref.pytor -stat 100

followed by other steps:

cnvpytor -conf df_ref_genome_conf.py -root $OUT/bff_ref_output/bff_ref.pytor -his 1000 10000 100000 -v debug -log hist.log
cnvpytor -conf df_ref_genome_conf.py -root $OUT/bff_ref_output/bff_ref.pytor -partition 1000 10000 100000 -v debug -log partition.log
cnvpytor -conf df_ref_genome_conf.py -root $OUT/bff_ref_output/bff_ref.pytor -call 1000 10000 100000 -v debug -log call.log > calls.tsv

We would appreciate your feedback.

Thank you,

Milovan

LorenaDerezanin commented 3 years ago

Sorry for a late reply, I got a chance to clone and test your fixed version yesterday. All worked out well and really fast! Thank you very much, Milovan, here are the debug log files of the suggested steps if you might be interested: logs_cnvpytor_fixed.tar.gz

suvakov commented 3 years ago

Thank you