chrisquince / DESMAN

De novo Extraction of Strains from MetAgeNomes
Other
69 stars 22 forks source link

Cutoff for SNV uncertainty #18

Open liuxianghui opened 7 years ago

liuxianghui commented 7 years ago

Could you kindly suggest the cutoff for average error in those inferences? Is 10% OK?

In the example, 'Which we interpret as the best run had six haplotypes, five of which we are confident in and the average error in those inferences was 1.6%. The best haplotypes are given by the file ClusterEC_6_2/Filtered_Tau_star.csv. This is what we will use in the analysis below.'

In the paper, 'we calcu- lated the number of haplotypes that had a mean SNV uncertainty (see above) below 10% and a mean relative abundance above 5%. We chose the optimal G to be the one that returned the most haplotypes satisfying these conditions of reproducibility and abundance.'

chrisquince commented 7 years ago

Yes for real data analyses I tend to use 10%, that may seem quite high but I believe it is an overestimate of the true uncertainty.

liuxianghui commented 7 years ago

So your suggestion is to give up those Clusters with over 10% uncertainty? I found in the latest paper S10. Summary of results from applying the DESMAN pipeline to the 32 Tara MAGs with coverage > 100. In the table, the Err (the estimated percentage uncertainty in those inferred haplotypes.) seems to quite big. Much larger than 0.10 ??

Moreover, I am a bit confused with ClusterEC.fa . As you see in the E.coli example you tried to merge 5 clusters first but for the real dataset, each cluster is assumed as a MAG and DESMAN is applied for each cluster. ...

By the way, could you kindly comment on the variants positions on core COGs? How many are needed to run the DESMAN process?