SchlossLab / Hannigan_CRCVirome_mBio_2018

Investigating the gut virus communities associated with colon cancer.
MIT License
11 stars 8 forks source link

Classify contigs by blasting only the longest reads (rep sequences). Avoid short because they are less informative. #44

Closed Microbiology closed 7 years ago

Microbiology commented 7 years ago

To kick things off I think it makes sense to pull the longest contig as the representative sequence and blast it. This means:

Microbiology commented 7 years ago

Download script is running now.

Microbiology commented 7 years ago

Alright reference genome set is downloaded and formatted. Now I need to get a rep contig seq from each OGU and blastn it against the dataset.

Microbiology commented 7 years ago

Rep set is... set... and now I am running a blast to get a feel of what I am dealing with.

Microbiology commented 7 years ago

Did this with tblastx. Important to interpret this all with caution though.