Closed CarnoZhao closed 4 years ago
Hi @CarnoZhao - thanks for your interest.
One general point: the "REF" column of a VCF must match the fasta file -- any tools that use FASTA/VCF/BAM files will expect that to be true. Uou cannot redefine PWK to be REF. (It is technically possible if, but you would need to change the mm10-fa to contain all the PWK alleles -- this is probably not what you want to do).
The correct approach is to make a 'multi-sample' VCF, with one column for PWK and one column for C57. A multi-sample VCF might look like this:
REF ALT PWK C57
C A 1/1 0/0
G T 0/0 1/1
The first line is the example you gave, and the 2nd line is another SNP that is specific to C57. vartrix will give you counts for both variants, but you will need to keep track of which variants are specific to which samples separately.
Thanks for replying! @pmarks
BTW, what if both 2 strains are different from reference fasta, e.g. mm10: A, pwk: G, c57: C.
Will this case be assigned to multi-alleles
, right?
@CarnoZhao - correct vartrix ignores multi-allele VCF entries like this:
#CHROM POS ID REF ALT
1 1581713 . A C,G
You can work around this limitation by expanding variants like this to be separate entries for each allele:
#CHROM POS ID REF ALT
1 1581713 . A C
1 1581713 . A G
Hi, I'm working with a allele-specific expression analysis, while neither of two strains was the reference strain (mouse: PWK * C57). That is, both of them has SNP sites compared with mm10 ref-genome.
I construct my own VCF file from all-strain vcf file from here, using the PWK base as REF and C57 base as ALT:
My problem is, when PWK and mm10 are different at a site, and my bam read is same as PWK:
Using original
vartrix
, theref_hap
will be...C...(from fa-mm10)
, and thealt_hap
will be...C...(from alt-c57)
. Now, the read is mapped toref_hap
andalt_hap
with same score, leading this read to becomeUNKOWN
. And I got manyVariant at index 4631 has multiple unknown reads at barcode index 12943
error.So, my solution is creating
ref_hap
from vcf ref base directly, instead of creating it from the fasta reference. I have built mymain.rs
code and it works fine for my 2-strain problem.