abyzovlab / CNVnator

a tool for CNV discovery and genotyping from depth-of-coverage by mapped reads
Other
206 stars 65 forks source link

Yes, the software requires .fa file for each chromosome. #264

Open sunbacteria opened 2 years ago

sunbacteria commented 2 years ago

Yes, the software requires .fa file for each chromosome.

Alexej Abyzov, Ph.D. Senior Associate Consultant, Assistant Professor of Biomedical Informatics, Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic

Mayo Clinic, Harwick 3-12 200 1st street SW, Rochester, MN 55905 tel: +1-(507)-538-0978 fax: +1-(507)-284-0745

Originally posted by @abyzov in https://github.com/abyzovlab/CNVnator/issues/74#issuecomment-315709639

lgmgeo commented 2 years ago

Hi,

I am using CNVnator for CNV calling with WGS data. Here are the commands I use:

# Extract read mapping
time $cnvnatorPATH/cnvnator -root sample.root -chrom chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY chrMT -tree $BAMdir/sample_GRCh37.bam
1774.998u 93.756s 31:22.44 99.2%        0+0k 174115928+3920824io 1349pf+0w

# Generate histogram
time $cnvnatorPATH/cnvnator -root sample.root -his 1000 -fasta $reference_fasta
113.035u 10.292s 2:09.31 95.3%  0+0k 6227496+26256io 286pf+0w

# Calculate statistics
time $cnvnatorPATH/cnvnator -root sample.root -stat 1000
3.828u 1.507s 0:06.06 87.7%     0+0k 3080+21728io 15pf+0w

# Partition
time $cnvnatorPATH/cnvnator -root sample.root -partition 1000
2637.586u 5.338s 2:39.76 1654.3%        0+0k 5720+31208io 36pf+0w

# Call CNVs
time $cnvnatorPATH/cnvnator -root sample.root -call 1000 > sample_cnvnator.out
6.954u 1.428s 0:13.37 62.6%     0+0k 528+2392io 2pf+0w

# Exporting CNV calls as VCFs
time $cnvnatorPATH/cnvnator2VCF.pl -prefix sample -reference GRCh37 sample_cnvnator.out > sample_cnvnator.vcf
0.041u 0.013s 0:00.09 55.5%     0+0k 16+656io 0pf+0w

Am I using CNVnator correctly?

Regarding on your Github Quick start guide, you can generate histogram from a single file_genome.fa.gz thanks to the -fasta option:

image

So I'm not sure I understand what step requires splitting this single whole genome fasta file (e.g. human_g1k_v37.fasta) into multiple fasta files for each chromosome ?

Thanks for your help,

Véronique

abyzov commented 2 years ago

Hi, look fine to me. I don’t think you need to split genome into individual chromosome when using -fasta option. We also recommend switching to new tool CNVpytor https://github.com/abyzovlab/CNVpytor.

Alexej Abyzov, Ph.D. Senior Associate Consultant, Associate Professor of Biomedical Informatics, Department of Quantitative Health Sciences, Center for Individualized Medicine, Mayo Clinic

Mayo Clinic, 200 1st street SW, Harwick 3-12 Rochester, MN 55905 www.abyzovlab.orghttp://www.abyzovlab.org tel: +1-(507)-538-0978

lgmgeo commented 2 years ago

Thank you for your quick answer!

Good to know for the CNVnator python extension! I'll take a look at CNVpytor.