abyzovlab / CNVnator

a tool for CNV discovery and genotyping from depth-of-coverage by mapped reads
Other
206 stars 65 forks source link

Does CNVnator support hg38? #280

Closed jingydz closed 1 year ago

jingydz commented 1 year ago

Does CNVnator support hg38?

Command

$ /xxx/software/CNVnator_v0.3.3/src/cnvnator
Not enough parameters.

CNVnator v0.3.3

Usage:
/xxx/software/CNVnator_v0.3.3/src/cnvnator -root out.root  [-genome name] [-chrom 1 2 ...] -tree  file1.bam ...
/xxx/software/CNVnator_v0.3.3/src/cnvnator -root out.root  [-genome name] [-chrom 1 2 ...] -merge file1.root ...
/xxx/software/CNVnator_v0.3.3/src/cnvnator -root file.root [-genome name] [-chrom 1 2 ...] [-d dir] -his bin_size
/xxx/software/CNVnator_v0.3.3/src/cnvnator -root file.root [-chrom 1 2 ...] -stat      bin_size
/xxx/software/CNVnator_v0.3.3/src/cnvnator -root file.root                  -eval      bin_size
/xxx/software/CNVnator_v0.3.3/src/cnvnator -root file.root [-chrom 1 2 ...] -partition bin_size [-ngc]
/xxx/software/CNVnator_v0.3.3/src/cnvnator -root file.root [-chrom 1 2 ...] -call      bin_size [-ngc]
/xxx/software/CNVnator_v0.3.3/src/cnvnator -root file.root -genotype bin_size [-ngc]
/xxx/software/CNVnator_v0.3.3/src/cnvnator -root file.root -view     bin_size [-ngc]
/xxx/software/CNVnator_v0.3.3/src/cnvnator -pe   file1.bam ... -qual val(20) -over val(0.8) [-f file]

Valid genomes (-genome option) are: NCBI36, hg18, GRCh37, hg19, mm9

Command

## step1: Extract read mapping 
time $CNVnator_HOME/src/cnvnator -root ${CNVnator_output}.root -tree $CNVnator_input -unique
## step2: Generate histogram
time $CNVnator_HOME/src/cnvnator -root ${CNVnator_output}.root -genome hg38 -his ${bin_size} -d $Chromosomes
## step3: Calculate statistics
time $CNVnator_HOME/src/cnvnator -root ${CNVnator_output}.root -stat ${bin_size}
## step4: Partition
time $CNVnator_HOME/src/cnvnator -root ${CNVnator_output}.root -partition ${bin_size}
## step5: Call CNVs
time $CNVnator_HOME/src/cnvnator -root ${CNVnator_output}.root -call ${bin_size} > ${CNVnator_output}.cnv
## step6: Import SNP data
time $path2perl $CNVnator_HOME/cnvnator2VCF.pl ${CNVnator_output}.cnv > ${CNVnator_output}.vcf
time $path2SURVIVOR stats ${CNVnator_output}.vcf -1 -1 -1 ${CNVnator_output}.vcf.stats

Output

sample.vcf ##fileformat=VCFv4.1 ##fileDate=20230624 ##reference=1000GenomesPhase3_decoy-GRCh37 ##source=CNVnator ...

Question

Why is the header of my VCF file output as GRCh37 when I use hg38? Is it possible that only the header is incorrect and the actual data is still based on hg38?

abyzov commented 1 year ago

Hi, yes CNVnator works with any genome in the. bam header. There is probably an error with conversion script to VCF. The coordinates are the same as in the original bam.

Alexej Abyzov, Ph.D. Senior Associate Consultant, Associate Professor of Biomedical Informatics, Department of Quantitative Health Sciences, Center for Individualized Medicine, Mayo Clinic

Mayo Clinic, 200 1st street SW, Harwick 7-91 Rochester, MN 55905 www.abyzovlab.org tel: +1-(507)-538-0978

jingydz commented 1 year ago

"CNVnator works with any genome in the .bam header" Why the introduction of CNVnator software shows that "Valid genomes (-genome option) are: NCBI36, hg18, GRCh37, hg19, mm9"? there are no "hg38"?

My bam header: @RG ID:6725D LB:6725D SM:6725D PL:ILLUMINA @PG ID:GATK IndelRealigner VN:nightly-2017-07-03-g1f763d5 CL:knownAlleles=[(RodBinding name=knownAlleles source=/home/work01/WGS/anno/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf)] targetIntervals=/home/work01/WGS/gatkout/6725D/6725D.realignertargetc @PG ID:MarkDuplicates VN:2.9.2-SNAPSHOT CL:picard.sam.markduplicates.MarkDuplicates INPUT=[/home/work01/WGS/gatkout/6725D/6725D.bam] OUTPUT=/home/work01/WGS/gatkout/6725D/6725D.marked.bam METRICS_FILE=/home/work01/WGS/gatkout/6725D/6725D.marked.metr @PG ID:bwa PN:bwa VN:0.7.15-r1140 CL:/home/work01/tools/bwa-0.7.15/bwa mem -M -t 12 -R @RG\tID:6725D\tLB:6725D\tSM:6725D\tPL:ILLUMINA /home/work01/WGS/anno/hg38/v0/Homo_sapiens_assembly38.fasta.64 /home/work01/WGS/cleandata/6725D/6725D_R1_clean_paired.fq /hom @PG ID:GATK PrintReads VN:nightly-2017-07-03-g1f763d5 CL:readGroup=null platform=null number=-1 sample_file=[] sample_name=[] simplify=false no_pg_tag=false

I just very confused about why the header of my output vcf file is "1000GenomesPhase3_decoy-GRCh37", which step caused this?

abyzov commented 1 year ago

Hi, sorry for the confusion. When dealing with SAM file one have to specify genome. For BAM files there is no need for that.

Alexej Abyzov, Ph.D. Senior Associate Consultant, Associate Professor of Biomedical Informatics, Department of Quantitative Health Sciences, Center for Individualized Medicine, Mayo Clinic

Mayo Clinic, 200 1st street SW, Harwick 7-91 Rochester, MN 55905 www.abyzovlab.org tel: +1-(507)-538-0978

On Jun 26, 2023, at 9:56 PM, Zhang Jingjing @.***> wrote:

"CNVnator works with any genome in the .bam header" Why the introduction of CNVnator software shows that "Valid genomes (-genome option) are: NCBI36, hg18, GRCh37, hg19, mm9"? there are no "hg38"?

My bam header: @rghttps://github.com/rg ID:6725D LB:6725D SM:6725D PL:ILLUMINA @pghttps://github.com/pg ID:GATK IndelRealigner VN:nightly-2017-07-03-g1f763d5 CL:knownAlleles=[(RodBinding name=knownAlleles source=/home/work01/WGS/anno/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf)] targetIntervals=/home/work01/WGS/gatkout/6725D/6725D.realignertargetc @pghttps://github.com/pg ID:MarkDuplicates VN:2.9.2-SNAPSHOT CL:picard.sam.markduplicates.MarkDuplicates INPUT=[/home/work01/WGS/gatkout/6725D/6725D.bam] OUTPUT=/home/work01/WGS/gatkout/6725D/6725D.marked.bam METRICS_FILE=/home/work01/WGS/gatkout/6725D/6725D.marked.metr @pghttps://github.com/pg ID:bwa PN:bwa VN:0.7.15-r1140 CL:/home/work01/tools/bwa-0.7.15/bwa mem -M -t 12 -R @rghttps://github.com/rg\tID:6725D\tLB:6725D\tSM:6725D\tPL:ILLUMINA /home/work01/WGS/anno/hg38/v0/Homo_sapiens_assembly38.fasta.64 /home/work01/WGS/cleandata/6725D/6725D_R1_clean_paired.fq /hom @pghttps://github.com/pg ID:GATK PrintReads VN:nightly-2017-07-03-g1f763d5 CL:readGroup=null platform=null number=-1 sample_file=[] sample_name=[] simplify=false no_pg_tag=false

I just very confused about why the header of my output vcf file is "1000GenomesPhase3_decoy-GRCh37", which step caused this?

— Reply to this email directly, view it on GitHubhttps://github.com/abyzovlab/CNVnator/issues/280#issuecomment-1608664722, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACLKGONIOCW5FS2B52WJTBTXNJDXFANCNFSM6AAAAAAZT4FVUE. You are receiving this because you commented.Message ID: @.***>

jingydz commented 1 year ago

If I have already done this to my BAM file (providing the -genome hg38 parameter), will the resulting file ignore the -genome hg38 parameter?

abyzov commented 1 year ago

Hi, yes for bam files option -genome is ignore.

Alexej Abyzov, Ph.D. Senior Associate Consultant, Associate Professor of Biomedical Informatics, Department of Quantitative Health Sciences, Center for Individualized Medicine, Mayo Clinic

Mayo Clinic, 200 1st street SW, Harwick 7-91 Rochester, MN 55905 www.abyzovlab.org tel: +1-(507)-538-0978

Message ID: @.***>