abyzovlab / CNVnator

a tool for CNV discovery and genotyping from depth-of-coverage by mapped reads
Other
206 stars 65 forks source link

Can't find directory 'bin_100' in file “/PATH/sample.root” #243

Open QianXiaobo opened 3 years ago

QianXiaobo commented 3 years ago

Hi, suvakov

# Extract read mapping
time cnvnator -root $sample.chr20.root -chrom Chr20 -tree $bamfile && \

# Generate histogram
time cnvnator -root $sample.chr20.root -chrom Chr20 -his 100 -fasta /PATH/ref.fa.gz && \

# Calculate statistics
time cnvnator -root $sample.chr20.root -stat 100 && \

# Partition
time cnvnator -root $sample.chr20.root -partition 100

I ran the above commands as a demo to call CNV on Chr20, but got error message as following:

Warning in <TFile::Append>: Replacing existing TH1: his_at_aggr (Potential memory leak).
Warning in <TFile::Append>: Replacing existing TH1: read_frg_len (Potential memory leak).
Warning in <TFile::Append>: Replacing existing TH1: pair_pos (Potential memory leak).

real    3m20.615s
user    2m55.221s
sys     0m10.008s
Can't find directory 'bin_100' in file '$sample.chr20.root'.

real    0m2.300s
user    0m1.168s
sys     0m0.637s
Can't find directory 'bin_100' in file '$sample.chr20.root'.

real    0m2.174s
user    0m1.161s
sys     0m0.640s

I think the step of "Generate histogram" was failed. I am not sure whether something wrong in -d or -fasta, I split the ref sequence to each chromosome(e.g. Chr01.fa,,, Chr20.fa, prefix of the chromosome fasta file is the same with the header of input BAM) and try again:

# Generate histogram
time cnvnator -root $sample.chr20.root -chrom Chr20 -his 100 -d /PATH/ 

Header of my BAM:

@HD     VN:1.5  SO:coordinate
@SQ     SN:Chr01        LN:119294037
@SQ     SN:Chr02        LN:238844183
@SQ     SN:Chr03        LN:183772800
@SQ     SN:Chr04        LN:92923151
@SQ     SN:Chr05        LN:112293617
@SQ     SN:Chr06        LN:93368415
@SQ     SN:Chr07        LN:123523216
@SQ     SN:Chr08        LN:104246046
@SQ     SN:Chr09        LN:64683936
@SQ     SN:Chr10        LN:90667400
@SQ     SN:Chr11        LN:85788095
@SQ     SN:Chr12        LN:106346232
@SQ     SN:Chr13        LN:64922896
@SQ     SN:Chr14        LN:47677399
@SQ     SN:Chr15        LN:50236165
@SQ     SN:Chr16        LN:50732815
@SQ     SN:Chr17        LN:47651721
@SQ     SN:Chr18        LN:33166870
@SQ     SN:Chr19        LN:26998044
@SQ     SN:Chr20        LN:100526017
@SQ     SN:Chr21        LN:98598843
@SQ     SN:Chr22        LN:38360130
@SQ     SN:Chr23        LN:47379281
@SQ     SN:Chr24        LN:46610899
@SQ     SN:Chr25        LN:47158557
@SQ     SN:Chr26        LN:28693288
@SQ     SN:Chr27        LN:32167772
@SQ     SN:Chr28        LN:63897334
@SQ     SN:Chr29        LN:37455388
@SQ     SN:Chr30        LN:30283642
@SQ     SN:ChrMT        LN:36596
@SQ     SN:ChrUnkown    LN:565467
@SQ     SN:ChrX LN:110613827
@SQ     SN:ChrY LN:12677250

First 10 lines of Chr20.fa:

>Chr20
CAGCTACACCAGAACCACTTGCCCAAACACTCTGGAATCACCTGCCGGGCATGCCAGCAC
AACGTCTCAGACGTGCCAGAGGCATGTGCCAGACACACCGGAACCACATCCCGGACAGAC
CGGAACCACCTGCCAGACTCGCCAGAACCACCTGCTTGACTTGCCAGACGCACCTGCCAG
ACACCCCAAAATCACCTGCTGGCTACACCGGAACCACTTGCCAGACACTCTGGAATCACC
TGCCGGGCATGCCAGCACAACCTCTCAGACATGCCAGAACCACGTGCCGCACACGCCGGA
ACCACCTCCCGGAAAGACCAGAACCACATGCCGGACACCCCAGAATCACCTGCTGAGCAC
TCCGGAACCACCTGCTCGACTCGCCAGAACGACCTACCGGACACGCCAAAATCACCAGCA
GGCTACGCCAGGACCACTTGTCAGACACTCCGGAATCACCTGCCAGGCATGATGGAACCA
CCTCTAAAACATCCCAGAACCACATACCGGAGACGCCGGTAGCACCGGCTGAGCATGCCA

However, it makes no sense again!!!! I have no idea now.

Hope for your reply. Thank you.

Xb

jlchen5 commented 3 years ago

YES!! I have the same question about this. When I run this: $ cnvnator -root c11_M.root -stat 1000 it'll go wrong, just like this: Can't find directory 'bin_1000' in file 'c11_M.root'.

abyzov commented 3 years ago

Hello, you need to conduct prior steps with the same bin size.

Alexej Abyzov, Ph.D. Senior Associate Consultant, Associate Professor of Biomedical Informatics, Department of Quantitative Health Sciences, Center for Individualized Medicine, Mayo Clinic

Mayo Clinic, 200 1st street SW, Harwick 3-12 Rochester, MN 55905 www.abyzovlab.orghttp://www.abyzovlab.org tel: +1-(507)-538-0978

wook2014 commented 2 years ago

I meet up with the same problem as they had. I used cnvnator by following commands:

cnvnator -root G1.root -tree G1.bam 
cnvnator -root G1.root -his 1000 -fasta ~/ref.fasta
cnvnator -root G1.root -stat 1000

There is no problem in step1 & 2, but when I run command

cnvnator -root G1.root -stat 1000

It said

Can't find directory 'bin_1000' in file 'G1.root'

How could it be? Is it because that there is error in the root file ?

I found another issue #167 which is the same as this one. I checked that when running command

cnvnator -root G1.root -his 1000 -fasta ~/ref.fasta

It said

Allocating memory ...
Done.

Not like what you have mentioned in #167 that it should reply

Allocating memory ...
Done.
Calculating histograms with bin size of xxx for 'xxx' ...
Making GC histogram for 'xxx' ...
Done.

I thought it might because that there are differences in chrom name? but the bam file I used is formed by mapping my reads to the same reference fasta file. I used bwa for mapping. I checked the chrom name in the bamfile and ref.fasta, I found they are the same. There order are also the same even though the bam file was sorted. I can not imagine what other problems might be.

jlchen5 commented 2 years ago

Do you install the root?

在 2021年10月12日,上午4:15,wook2014 @.***> 写道:

 I meet up with the same problem as they had. I used cnvnator by following commands:

cnvnator -root G1.root -tree G1.bam cnvnator -root G1.root -his 1000 -fasta ~/ref.fasta cnvnator -root G1.root -stat 1000 There is no problem in the first 2 steps, but when I run command

cnvnator -root G1.root -stat 1000 It said

Can't find directory 'bin_1000' in file 'G1.root' How could it be? Is it because that there is error in the root file ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

abyzov commented 2 years ago

Hi, the error message means that no processed data were saved to the file. Did you get some output when running previous commands? What is the size of G1.root file?

Alexej Abyzov, Ph.D. Senior Associate Consultant, Associate Professor of Biomedical Informatics, Department of Quantitative Health Sciences, Center for Individualized Medicine, Mayo Clinic

Mayo Clinic, 200 1st street SW, Harwick 3-12 Rochester, MN 55905 www.abyzovlab.orghttp://www.abyzovlab.org tel: +1-(507)-538-0978

wook2014 commented 2 years ago

Do you install the root? 在 2021年10月12日,上午4:15,wook2014 @.***> 写道:  I meet up with the same problem as they had. I used cnvnator by following commands: cnvnator -root G1.root -tree G1.bam cnvnator -root G1.root -his 1000 -fasta ~/ref.fasta cnvnator -root G1.root -stat 1000 There is no problem in the first 2 steps, but when I run command cnvnator -root G1.root -stat 1000 It said Can't find directory 'bin_1000' in file 'G1.root' How could it be? Is it because that there is error in the root file ? — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

I used conda to establish the cnvnator environment

wook2014 commented 2 years ago

Hi, the error message means that no processed data were saved to the file. Did you get some output when running previous commands? What is the size of G1.root file? Alexej Abyzov, Ph.D. Senior Associate Consultant, Associate Professor of Biomedical Informatics, Department of Quantitative Health Sciences, Center for Individualized Medicine, Mayo Clinic ----------------------------- Mayo Clinic, 200 1st street SW, Harwick 3-12 Rochester, MN 55905 www.abyzovlab.org<http://www.abyzovlab.org> tel: +1-(507)-538-0978

I got the root file after running

cnvnator -root G1.root -tree G1.bam

It is about the same size as the reference genome fasta file, which is 18MB. But it is much smaller than the bam file, which is over 1.2GB.

abyzov commented 2 years ago

Hi, if you can place root file somewhere for me to download I’ll investigate what is going.

Alexej Abyzov, Ph.D. Senior Associate Consultant, Associate Professor of Biomedical Informatics, Department of Quantitative Health Sciences, Center for Individualized Medicine, Mayo Clinic

Mayo Clinic, 200 1st street SW, Harwick 3-12 Rochester, MN 55905 www.abyzovlab.orghttp://www.abyzovlab.org tel: +1-(507)-538-0978

wook2014 commented 2 years ago

Hi, if you can place root file somewhere for me to download I’ll investigate what is going. Alexej Abyzov, Ph.D. Senior Associate Consultant, Associate Professor of Biomedical Informatics, Department of Quantitative Health Sciences, Center for Individualized Medicine, Mayo Clinic ----------------------------- Mayo Clinic, 200 1st street SW, Harwick 3-12 Rochester, MN 55905 www.abyzovlab.org<http://www.abyzovlab.org> tel: +1-(507)-538-0978

Here is the root file. Sorry to reply so late. Looking for your reply. G1.zip

abyzov commented 2 years ago

Hi, based on the content of the .root file something happen at -his stage. Did you get any error message?

Alexej Abyzov, Ph.D. Senior Associate Consultant, Associate Professor of Biomedical Informatics, Department of Quantitative Health Sciences, Center for Individualized Medicine, Mayo Clinic

Mayo Clinic, 200 1st street SW, Harwick 3-12 Rochester, MN 55905 www.abyzovlab.orghttp://www.abyzovlab.org tel: +1-(507)-538-0978

liuhankui commented 2 years ago

Hi, based on the content of the .root file something happen at -his stage. Did you get any error message? Alexej Abyzov, Ph.D. Senior Associate Consultant, Associate Professor of Biomedical Informatics, Department of Quantitative Health Sciences, Center for Individualized Medicine, Mayo Clinic ----------------------------- Mayo Clinic, 200 1st street SW, Harwick 3-12 Rochester, MN 55905 www.abyzovlab.org<http://www.abyzovlab.org> tel: +1-(507)-538-0978

I also have the same warning.

/anaconda3/envs/myenv/bin/cnvnator -root 20B10304823-1.root -stat 1000000 -chrom chr1 Can't find directory 'bin_1000000'. Making statistics for chr1 ... Can't find directory 'bin_1000000'. Can't find directory 'bin_1000000'. Can't find directory 'bin_1000000'. Can't find RD histogram for 'chr1'. Can't find unique RD histogram for 'chr1'. Average RD per bin (1-22) is 0 +- 0 (before GC correction) Average RD per bin (X,Y) is 0 +- 0 (before GC correction) Making directory bin_1000000 ... Correcting counts by GC-content for 'chr1' ... Making statistics for chr1 after GC correction ... Can't find histogram for 'chr1'. Average RD per bin (1-22) is 0 +- 0 (after GC correction) Average RD per bin (X,Y) is 0 +- 0 (after GC correction)

abyzov commented 2 years ago

Hi, looks like something went wrong during the -his step.

Alexej Abyzov, Ph.D. Senior Associate Consultant, Associate Professor of Biomedical Informatics, Department of Quantitative Health Sciences, Center for Individualized Medicine, Mayo Clinic

Mayo Clinic, 200 1st street SW, Harwick 3-12 Rochester, MN 55905 www.abyzovlab.orghttp://www.abyzovlab.org tel: +1-(507)-538-0978

elcortegano commented 2 years ago

Has anybody found a solution for this problem? I am basically having the same issues as above.

cnvnator -root test.root -tree test.bam
cnvnator -root test.root -his 1000 -fasta test.fa

Will run fine, but

cnvnator -root test.root -stat 1000

Fails with error:

Can't find directory 'bin_1000' in file 'test.root'.

I have noticed that the command cnvnator -root test.root -his 1000 -fasta test.fa changed file.root because the checksum did, but the size of the file did not change. This supports that the problem happened in the -his step, but it does not help to solve it. Any ideas? Thanks

huangl07 commented 1 year ago

I find that when I run -his without fasta ,all is ok

archmageirvine commented 1 year ago

Exact same problem, it appears the histogram step doesn't function correctly. There is no error message and the only output is

cnvnator -root cnvnator-1mM/cnvnator-1mM.root -his 1000 -fasta ${REF}
Allocating memory ...
Done.

Then,

cnvnator -root cnvnator-1mM/cnvnator-1mM.root -stat 1000
Can't find directory 'bin_1000' in file 'cnvnator-1mM/cnvnator-1mM.root'

I have verified that my BAM and reference are using the same chromosome naming conventions.

abyzov commented 1 year ago

Hi, looks like the step before -stat didn’t complete properly. How did -tree step completed? Do you have an output from that step?

Alexej Abyzov, Ph.D. Senior Associate Consultant, Associate Professor of Biomedical Informatics, Department of Quantitative Health Sciences, Center for Individualized Medicine, Mayo Clinic

Mayo Clinic, 200 1st street SW, Harwick 7-91 Rochester, MN 55905 www.abyzovlab.org tel: +1-(507)-538-0978

On Apr 20, 2023, at 2:58 AM, archmageirvine @.***> wrote:

Exact same problem, it appears the histogram step doesn't function correctly. There is no error message and the only output is cnvnator -root cnvnator-1mM/cnvnator-1mM.root -his 1000 -fasta ${REF} Allocating` memory ... Done. Then,

cnvnator -root cnvnator-1mM/cnvnator-1mM.root -stat 1000 Can't find directory 'bin_1000' in file 'cnvnator-1mM/cnvnator-1mM.root'

I have verified that my BAM and reference are using the same chromosome naming conventions.

— Reply to this email directly, view it on GitHubhttps://github.com/abyzovlab/CNVnator/issues/243#issuecomment-1515523388, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACLKGOILH5RCFWSKZIOVSQTXCB3ZNANCNFSM45LSSXTA. You are receiving this because you commented.Message ID: @.***>

archmageirvine commented 1 year ago

Here is the output from the -tree step:

Parsing file NA24385-HG002-20220519-1mM.bam ...
Allocating memory ...
Done.
Filling and saving tree for '1' ...
Filling and saving tree for '2' ...
Filling and saving tree for '3' ...
Filling and saving tree for '4' ...
Filling and saving tree for '5' ...
Filling and saving tree for '6' ...
Filling and saving tree for '7' ...
Filling and saving tree for '8' ...
Filling and saving tree for '9' ...
Filling and saving tree for '10' ...
Filling and saving tree for '11' ...
Filling and saving tree for '12' ...
Filling and saving tree for '13' ...
Filling and saving tree for '14' ...
Filling and saving tree for '15' ...
Filling and saving tree for '16' ...
Filling and saving tree for '17' ...
Filling and saving tree for '18' ...
Filling and saving tree for '19' ...
Filling and saving tree for '20' ...
Filling and saving tree for '21' ...
Filling and saving tree for '22' ...
Filling and saving tree for 'X' ...
Filling and saving tree for 'Y' ...
Writing histograms ... 
Total of 875944819 reads were placed.

I did eventually get through to calls by following the comment of huangl07 and completely leaving off the reference in histogram creation -- but I suspect that has skipped GC-content correction as a result.

My CNVnator is the conda install.

abyzov commented 1 year ago

Hi, good to hear you got to the results! We recommend all the users to switch to CNVpytor (https://github.com/abyzovlab/CNVpytor) as it is maintained and (we believe) is more robust.

Alexej Abyzov, Ph.D. Senior Associate Consultant, Associate Professor of Biomedical Informatics, Department of Quantitative Health Sciences, Center for Individualized Medicine, Mayo Clinic

Mayo Clinic, 200 1st street SW, Harwick 7-91 Rochester, MN 55905 www.abyzovlab.org tel: +1-(507)-538-0978

archmageirvine commented 1 year ago

Thank you.