Closed bfjia closed 4 years ago
What fraction of transcripts do you expect to be bacterial? We can handle large numbers of hits, but not if they are in the same ballpark as host. Ideally we would include a good set of marker genes such as 16S, COX2, ITS, BPHD, alkB, nifH etc. SSU is easy, I'm going to screen 16S and ITS database for Cov-related problems this morning. Edit -- ITS for fungi, obviously. The forgotten microbes.
16S is going to explode the analysis, we're going to have hits everywhere all the time. Before we venture off into a LWIA let's see how AMR behaves.
Hits everywhere is good up to a point because these hits have scientific value. SSU is 0.03% of a bacterial genome, is it really going to blow up the number of alignments? If so, I would agree punt for now. Can we run a yotta-genome test including SSU and AMR on the ~100 we did for FLOM screen? Edit -- I suppose it could be a much larger fraction of the transcripts. This is my first time with RNA-seq, learning as I go.
Ok, but I think we should do a 16S proof of concept batch for the paper. LHF for us.
That's a good idea yes, what would be a good application / test case of ~10,000 samples?
Aim for diversity across as many different microbial ecosystems as possible, the main variables are host species and tissue type / feces or whatever, and environments: soil, seawater, windshield bug splatter, space station control surfaces etc. (I'm not making these last two up, there are 16S studies of both). Edit -- To save time, just fine to take a random subset of the SRA. Hits everywhere. Include DNA as well as RNA, though SSU is much harder to find in DNA as I have just learned.
Maybe i should clarify, there are no 16s SSU sequences/transcripts in VFDB or the collection of AMR genes. by non-pathogenic proteins i mean virulence genes that are very abundant across both human and environmental bacterial species and would be very difficult to understand in term of lateral gene transfer.
My bad for confusing the issue, I wanted to add 16S to detect novel bacteria. This might be a good idea with DNA but not with RNA because there are too many ribosomal transcripts. Understood re. virulence genes. I have a set of vertebrate genomes I use for screening (human, bat, chicken, fish, pig), I can run against those if that would help.
Good to close for now? We put this on the back-burner until we have more time to dedicate.
Good idea, maybe we have time to do this next month. Closing fornow.
Branching from issue #135. On top of AMR genes, scanning for virulence factors present in environmental bacteria would also be interesting.
TODO: virulence factor database needs to be curated to remove non-pathogenic protein and systems (e.g. secretion systems) to limit the potential pool of hits.