Open sotaro-kanematsu opened 3 years ago
Hi @sotaro-kanematsu ,
Not an author of CNVkit, plus I do not know a lot about this script => But my wild guess would be: have you checked if chromosome names match between your input FASTA, BAM and VCF files ?
This note in the script code warns about the fact that this name check is not (yet) made automatically
=> You are talking about using "hg38.fa" when you VCF header indeed mentions "##reference=GRCh38.p7" (5th line) and they are supposed to have 2 different chromosome naming conventions ("chr1" vs "1")
=> Also snpfilter.sh
is supposed to work without an input FASTA file, so maybe 1st try to use it with BAM+VCF only? Then add a FASTA if it is OK ?
Hope this helps. Have a nice day. Felix.
Hi @tetedange13 Thank you for your kindly advice! I will check reference verson, and try again!
Hi @tetedange13 I checked the reference version and tried again. I understood how to generate input vcf file when I call allele-specific copy number alternation using CNVkit. All I know is that CNVkit uses vcf file that contatained GT, AD, and DP. GT value is indispensable for the extraction of mutations in germline- heterozygous positions. AD and DP are indispensable for calculation of b-allele frequency. So, here is the main part I want to discuss, I am wondering whether snpfilter script works correctly. The script uses samtools mpileup, and I tried as script shows, however, the call vcf file is somewhat strange. It looks like bcf file ( so, I can read only when the file format is converted by bcftools view), and does not contain information about genotype. (samtools version: 1.11 (using htslib 1.11), bcftools version: 1.8 (using htslib1.8))
I use bcftools mpileup as follows. So now, bcftools run soooooo slow, however it works well, and the results CNVkit called are comparable with Sequenza, or Freec.
I tried to run the snpfilter.sh to obtain the vcf file for uncovering the allele-specific copy number alternations in the tumor samples. I prepared tumor bam file, common dbSNP vcf file, and hg38.fa, and run the shell, however sites.txt via bcftools view were null because _snp.vcf had the only header. my input vcf file is as follows. What is wrong?
fileformat=VCFv4.0
fileDate=20170710
source=dbSNP
dbSNP_BUILD_ID=150
reference=GRCh38.p7
phasing=partial
variationPropertyDocumentationUrl=ftp://ftp.ncbi.nlm.nih.gov/snp/specs/dbSNP_BitField_latest.pdf
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=SubSNP->Batch.link_out">
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=5% minor allele frequency in each and all populations">
INFO=5% minor allele frequency in 1+ populations">
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
INFO=
FILTER=
INFO=
INFO== 1% and for which 2 or more founders contribute to that minor allele frequency.">
INFO=
CHROM POS ID REF ALT QUAL FILTER INFO
chr1 10177 rs367896724 A AC . . RS=367896724;RSPOS=10177;dbSNPBuildID=138;SSR=0;SAO=0;VP=0x050000020005170026000200;GENEINFO=DDX11L1:100287102;WGT=1;VC=DIV;R5;ASP;VLD;G5A;G5;KGPhase3;CAF=0.5747,0.4253;COMMON=1 chr1 10352 rs555500075 T TA . . RS=555500075;RSPOS=10352;dbSNPBuildID=142;SSR=0;SAO=0;VP=0x050000020005170026000200;GENEINFO=DDX11L1:100287102;WGT=1;VC=DIV;R5;ASP;VLD;G5A;G5;KGPhase3;CAF=0.5625,0.4375;COMMON=1 chr1 10352 rs145072688 T TA . . RS=145072688;RSPOS=10353;dbSNPBuildID=134;SSR=0;SAO=0;VP=0x050000020005000002000200;GENEINFO=DDX11L1:100287102;WGT=1;VC=DIV;R5;ASP;CAF=0.5625,0.4375;COMMON=1 chr1 10616 rs376342519 CCGCCGTTGCAAAGGCGCGCCG C . . RS=376342519;RSPOS=10617;dbSNPBuildID=142;SSR=0;SAO=0;VP=0x050000020005040026000200;GENEINFO=DDX11L1:100287102;WGT=1;VC=DIV;R5;ASP;VLD;KGPhase3;CAF=0.006989,0