Closed cuttlefishh closed 7 years ago
Note that we ran a lot of samples with old and new primers to compare results, is in the mSystems paper led by Embriette
On Jan 21, 2016, at 9:22 AM, Luke Thompson notifications@github.com wrote:
Choice of 16S database affects results to some degree. Main choices are RDP, SILVA, and Greengenes.
Which is more representative for environmental microbes, for host-associated microbes? How do the results (downstream analyses) change with different databases? Silva has better representation -- Greengenes team are working to update accordingly. — Reply to this email directly or view it on GitHub https://github.com/biocore/emp/issues/46.
Thanks! Link to paper: http://msystems.asm.org/mSystems.00009-15-abstract.php
Greengenes is developed internally, right? First by Todd Desantis, now by @wasade and others.
What is the ROI of maintaining a database, rather than using an off-the-shelf one? I wasn't around when greengenes launched so I don't know the original motivation for it's creation, or how the field has changed since then.
Some folks really like the SLIVA alignment, although that alignment may not matter as much for us.
SILVA does not construct a de novo phylogeny on each release and instead uses parsimony insertion via ARB to insert new sequences. The effect is that the SILVA is not as well suited to characterizes candidate phyla.
The Greengenes Consortium includes Rob, Phil Hugenholtz, Todd and I right now. We are very interested in expanding out development effort. The fundamental limitations right now are that we do not have centralized infrastructure in place, and developer support is thin. There is an open and in progress RFC about the Greengenes infrastructure if you'd like to contribute though.
So these things hold us back from adopting silva:
Anything else? If we could address all these issues, would we be comfortable switching to silva? I'm not sure how the community feels about this... Comments welcome.
@ekopylova Would you be interested in comparing the amount of novel diversity as identified by the closed-reference OTU picking to GG and Sliva? This would be per sample based on the number of sequences mapped to Greengenes v.13.8 97% and Silva v.123 97%. Basically we want to know two things:
When do you need to have this done by?
Within one week would be great as some other things depend on this. Please write down what you did so we can easily insert into the methods for the paper. Thanks!
Not sure I have the bandwidth during this week. If end of next week is possible then I can.
I can look at this now, @cuttlefishh anyone else working on?
This is important so if you can find bandwidth I'd appreciate it...thanks!
On Jun 12, 2016, at 6:02 PM, Evguenia Kopylova notifications@github.com wrote:
Not sure I have the bandwidth for that during this week. If end if next week is possible then I can.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biocore/emp/issues/46#issuecomment-225471092, or mute the thread https://github.com/notifications/unsubscribe/AB69KVbvzaqLhb8hrLlwi3u_3GPp-juJks5qLKwUgaJpZM4HJsRU .
Choice of 16S database affects results to some degree. Main choices are RDP, SILVA, and Greengenes.