bsmith89 / StrainFacts

Factorize metagenotypes to infer strains and their abundances
MIT License
11 stars 1 forks source link

Interpreting the genotype output #13

Closed ajverster closed 6 months ago

ajverster commented 11 months ago

I am running StrainFacts to identify strains of B. vulgatus in metagenomes. I am very happy with the results in that StrainFacts can recapitulate the results I found with StrainFinder, at least at the level of strain abundance (the community file). However, I am having difficulty interpreting the genotype file. This is how I extracted the genotype and the community file.

sfacts dump ../Results/StrainFactsMetaGenome-fit.nc --genotype StrainFactsMetaGenome-fit.geno.tsv --community StrainFactsMetaGenome-fit.comm.tsv

StrainFactsMetaGenome-fit.geno.tsv has a probability between 0 and 1 for the genotype for each strain at each position. My question is whether 0 or 1 corresponds to the Ref or the Alt in the input of metagenome SNP counts? My goal is to reconstruct a sequence for each one of these strains and insert it into a phylogenetic tree.

bsmith89 commented 11 months ago

Hey Ajverster; thanks for reaching out.

A value of 1 is all alternative allele and 0 is all reference allele.

Hope this helps!

-Byron