10XGenomics / vartrix

Single-Cell Genotyping Tool
MIT License
185 stars 27 forks source link

VCF preparation for per base extraction #26

Open ahy1221 opened 5 years ago

ahy1221 commented 5 years ago

Dear developers: I was wondering that how to prepare a VCF for vartrix if I want to get variation information for a specific region per base. For example , I want to get all variation information for the whole mitochondria information (~16K bases). What I have now is a bed file defining mitochondria region without allele information. How can I prepare a VCF file for vartrix ?

Best, Yao He

ifiddes-10x-zz commented 5 years ago

Are you trying to interrogate all possible bases at every site, or are you interested only in known variants? For known variants, I would download a VCF from a site like UCSC or dbSNP.

For unknown variants, it would require creating a custom file. The VCF format that VarTrix is looking for is quite simple, VarTrix is really only interested in the first 3 columns. The rest of the columns can be dummy values. I would create a custom VCF that had 16k * 3 lines, where the 3 lines are the 3 non-reference bases at that position(if the reference is A at position 1, then your VCF will have three lines of the format chrM 1 {G,C,T} <rest of columns to make a dummy VCF>. Make sure to remember that VCF is 1-based, not 0-based.

ahy1221 commented 5 years ago

Thank you very much! A custom VCF for unknown variants is what I want. I would try that as you said.