abyzovlab / CNVnator

a tool for CNV discovery and genotyping from depth-of-coverage by mapped reads
Other
212 stars 66 forks source link

Is CNVnator a software that can only run in single-thread mode? #281

Open jingydz opened 1 year ago

jingydz commented 1 year ago

Is CNVnator a software that can only run in single-thread mode? Is CNVnator a software that can only run in single-thread mode?

abyzov commented 1 year ago

Hi, by default CNVnator is compiled with OMP parallel support. During segmentation step it can use as many core as provided.

Alexej Abyzov, Ph.D. Senior Associate Consultant, Associate Professor of Biomedical Informatics, Department of Quantitative Health Sciences, Center for Individualized Medicine, Mayo Clinic

Mayo Clinic, 200 1st street SW, Harwick 7-91 Rochester, MN 55905 www.abyzovlab.org tel: +1-(507)-538-0978

jingydz commented 1 year ago

Sorry for forgetting to reply, thank you, I have solved it, just need to add "export OMP_NUM_THREADS=number" to solve it.

jingydz commented 1 year ago

In addition, I have a sample, I have run it many times, but it has been unable to run the result, can you help me to see why?

Command

CNVnator_input=/xxx/xxx.marked.realigned.recal.bam time $CNVnator_HOME/src/cnvnator -root ${CNVnator_output}.root -tree ${CNVnator_input} -unique time $CNVnator_HOME/src/cnvnator -root ${CNVnator_output}.root -genome hg38 -his ${bin_size} -d $Chromosomes

Log

Error in : Cannot allocate 14069 bytes for ID = position Title = chr2 Error in : Failed filling branch:chr2.position, nbytes=-1, entry=44877163 This error is symptomatic of a Tree created as a memory-resident Tree Instead of doing: TTree T = new TTree(...) TFile f = new TFile(...) you should do: TFile f = new TFile(...) TTree T = new TTree(...) ...SysError in : cannot seek to position -1971239420 in file /xxx/xxx.root, retpos=-1 (Invalid argument) Can't find any histograms. 。。。

abyzov commented 1 year ago

Hi, Is it the only sample that fails? When working with BAM you don’t need to specify genome.

Alexej Abyzov, Ph.D. Senior Associate Consultant, Associate Professor of Biomedical Informatics, Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic

Mayo Clinic, 200 1st street SW, Harwick 3-12 Rochester, MN 55905 www.abyzovlab.orghttp://www.abyzovlab.org/ tel: +1-(507)-538-0978<tel:+15075380978>

jingydz commented 1 year ago

No, there were a dozen of them. Yes, I later learned that if I use bam I don't need to specify the genome, but it ignores the --genome parameter.

This bam like: ERR1347664.194830679 1123 chr1 9998 0 2S38M60S = 10005 68 AACGATAACCCTAACCCTAACCCCAACCCTAACCCTAACCATGACCCTTACGTCTACCCGAACCCCAACCCTAACCCTACCCCCCCGCCTGACACAAACT '00''''7707<7<7<000707<000''000707'7<7'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' AS:i:33 MC:Z:39S61M MQ:i:0 XA:Z:chr9,-138129062,61S34M5S,0;chr6,+147869,4S36M60S,1;chr16,+88880231,5S30M65S,0;chr12,+10242,5S35M60S,1;chr12,+10543,5S35M60S,1;chr18,+94652,5S35M60S,1;chr12,+124267493,5S35M60S,1;chr15,-101981028,65S30M5S,0;chr13,-114354167,60S35M5S,1;chr18,+63673,5S19M1D16M60S,1;chr10,-80216942,64S27M9S,0;chr12_GL877875v1_alt,+543,5S35M60S,1;chr12_GL877875v1_alt,+242,5S35M60S,1; XS:i:34 MD:Z:0N0N0N18T16 NM:i:4 RG:Z:ERR1347664

jingydz commented 1 year ago

Another problem is that I see someone filtering the vcf results of CNVnator with the following conditions: "CNV calls were filtered using stringent criteria including P-value < 0.05 and minimum size > 1 Kb, and calls with > 50% of q0 (zero mapping quality) reads within the CNV regions were removed (q0 filter). "

minimum size can be filtered using the SVLEN key. But which key does p-value and q0 correspond to?

example: chr1 1 CNVnator_del_1 N . PASS END=10000;SVTYPE=DEL;SVLEN=-10000;IMPRECISE;natorRD=0;natorP1=1.59373e-11;natorP2=1.87087e-51;natorP3=1.99216e-11;natorP4=2.03817e-39;natorQ0=-1 GT:CN 1/1:0

jingydz commented 1 year ago

Error

"File is more than 2 Gigabytes" I found that some root files larger than "2GB" could not run successfully to get results. SysError in : error flushing file xxx.root (File too large)

such as: 【4.1T 8月 10 01:19 SAMEA3302667.root】 【8.7T Aug 10 09:12 SAMEA3302715.root】 【5.7T Aug 10 09:46 SAMEA3302857.root】

abyzov commented 1 year ago

Hi, file size shouldn’t be a problem. I worked with files that were over 3 Gb and had no issues. What is the size of the files you are working with? Looks like in Tb range. Is it correct?

Alexej Abyzov, Ph.D. Senior Associate Consultant, Associate Professor of Biomedical Informatics, Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic

Mayo Clinic, 200 1st street SW, Harwick 7-91 Rochester, MN 55905 www.abyzovlab.org http://www.abyzovlab.orgtel: +1-(507)-538-0978

jingydz commented 1 year ago

Hi, sorry to reply you late. I have solved this problem and the solution I used is to divide it into a single chromosome run (1... 22+X+Y+M), which is feasible (parameter -chrom). Yes, there is nothing wrong with my bam file, it is just that it is a bit large, ranging from 40G to 150G, but the root files they generate are all unable to get vcf files larger than 2GB. (I feel this situation is random, as only about 1 in 10 files larger than 2GB have a problem with the file being too large to get the vcf file). They all report errors in the step of building the histogram.

Error log

Parsing file /xxx/sample.marked.realigned.recal.bam ... Allocating memory ... Done. Filling and saving tree for 'chr1' ... Filling and saving tree for 'chr2' ... Filling and saving tree for 'chr3' ... Filling and saving tree for 'chr4' ... Filling and saving tree for 'chr5' ... Filling and saving tree for 'chr6' ... Filling and saving tree for 'chr7' ... Filling and saving tree for 'chr8' ... Filling and saving tree for 'chr9' ... Filling and saving tree for 'chr10' ... Filling and saving tree for 'chr11' ... Filling and saving tree for 'chr12' ... Filling and saving tree for 'chr13' ... Filling and saving tree for 'chr14' ... Filling and saving tree for 'chr15' ... Filling and saving tree for 'chr16' ... Filling and saving tree for 'chr17' ... Filling and saving tree for 'chr18' ... Filling and saving tree for 'chr19' ... Filling and saving tree for 'chr20' ... Filling and saving tree for 'chr21' ... Filling and saving tree for 'chr22' ... Filling and saving tree for 'chrX' ... Filling and saving tree for 'chrY' ... Filling and saving tree for 'chrM' ... ... Filling and saving tree for 'HLA-DRB115:02:01' ... Filling and saving tree for 'HLA-DRB115:03:01:01' ... Filling and saving tree for 'HLA-DRB115:03:01:02' ... Filling and saving tree for 'HLA-DRB116:02:01' ... Writing histograms ... Total of 1378403710 reads were placed.

real 49m47.400s user 40m51.651s sys 8m30.983s Allocating memory ... Done. Calculating histograms with bin size of 500 for 'chr1' ... Making directory bin_500 ... Making GC histogram for 'chr1' ... SysError in : cannot seek to position -1938320448 in file /xxx/sample.root, retpos=-1 (Invalid argument) SysError in : cannot seek to position -1297036177263337374 in file /xxx/sample.root, retpos=-1 (Invalid argument) SysError in : cannot seek to position -1297036177263337374 in file /xxx/sample.root, retpos=-1 (Invalid argument) SysError in : cannot seek to position -1297036177263337230 in file /xxx/sample.root, retpos=-1 (Invalid argument) SysError in : error flushing file /xxx/sample.root (File too large) Done. ... real 0m0.198s user 0m0.151s sys 0m0.042s Reading calls ...

real 0m0.062s user 0m0.036s sys 0m0.018s Processing: 0 Parsing done: Tot DEL DUP INS INV TRA 0 0 0 0 0 0

Anyway, I've solved the problem (splitting into single chromosomes), thanks for your reply. If someone is having the same problem as me, hopefully this solution will help them.

abyzov commented 1 year ago

Good hear. Thanks for letting us know.

Alexej Abyzov, Ph.D. Senior Associate Consultant, Associate Professor of Biomedical Informatics, Department of Quantitative Health Sciences, Center for Individualized Medicine, Mayo Clinic

Mayo Clinic, 200 1st street SW, Harwick 7-91 Rochester, MN 55905 www.abyzovlab.org http://www.abyzovlab.orgtel: +1-(507)-538-0978