Griffan / VerifyBamID

VerifyBamID2: A robust tool for DNA contamination estimation from sequence reads using ancestry-agnostic method.
http://griffan.github.io/VerifyBamID/
92 stars 15 forks source link

Question about the tool options #30

Closed u1200538 closed 3 years ago

u1200538 commented 3 years ago

Hello, Thank you for creating such a useful tool.

I have some question about the VerifyBamID cmdline options. Q1. I used to run verifyBAMID_1.1.3 before, so I want to try run the new version of VerifyBAMID in this time. In my case, executing the previous version of verifyBamID cmdline like below,

verifyBamID --vcf customized.vcf --bam human.bam --chip-none -free-full --minQ 20 --maxQ 50 --maxDepth 200 -precise Could you tell me the similar options at the verifyBamID_2.0.1?

Q2. Additionally, I was wondering that the "--min-BQ" options meaning. Is it a right means like "minimum base quality"?

Q3. I want to make my own vcf panel, The vcf file input I used in the previous version does not apply to the new VerifyBAMID version. I tried to create the inputs needed to use the tool by using the --RefVCF option, but I got the following error message.

FATAL ERROR - No individual genotype information exist in the input VCF file 1KG.phase3.GRCh38.SPECIFY.AF.vcf

Exiting due to ERROR: Exception was thrown

Please tell me how to do it.

Thank you,

Griffan commented 3 years ago

Hi, @omG0-hub

Q1. In 2.0.1, you can you "—max-depth" to adjust maximal depth allowed, "—min-BQ" for "—minQ", and this version always uses log scale to calculate llks.

Q2. Yes, "—min-BQ" means skipping bases with quality smaller than this value.

Q3. In 2.0.1, the "—RefVCF" expects to receive a genotype matrix, which means you should feed VB2 a VCF contains not only site information but also genotype of a bunch of representative samples from a diverse population background. You may refer to 1000g genotype vcf file as an example which the resource files in this repo are generated from(with smaller marker number and/or smaller sample size).

Fan

u1200538 commented 3 years ago

Thanks to your reply, I understand almost of my questions.

But, I still difficult to solve the third question I posted above. Actually, I have purpose to create the customized population reference panel for checking the sample's contamination. Previously, I used to download reference vcf file from the 1KG phase3. and then, filtered by specific variants represent unique population. Like this method, I try to make a reference panel. Could you recommend any representative reference VCF file, such as you posted at the git-hub .. like "ReferencePanel.vcf.gz" ? I want to know exact format about reference VCF file that can adjust in the "VerifyBamId --RefVCF" cmdline for making SVD files. It's really helpful to me solve the problem.

Thank you,

Griffan commented 3 years ago

You can extract a subset of genotype vcf from 1000g project like this: bcftools view -v snps -O z -R SelectedSite.vcf "http://hgdownload.cse.ucsc.edu/gbdb/hg19/1000Genomes/phase3/ALL.chr${chr}.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz" > result.ReferencePanel.vcf.gz

u1200538 commented 3 years ago

Thank you, I'll try this method !