biocore / qiime

Official QIIME 1 software repository. QIIME 2 (https://qiime2.org) has succeeded QIIME 1 as of January 2018.
GNU General Public License v2.0
286 stars 267 forks source link

new function to BLAST n randomly selected reads against nr and summarize results as an html page #1010

Open gregcaporaso opened 11 years ago

gregcaporaso commented 11 years ago

To help with debugging weird sequencing results, we should develop a new function that BLASTs randomly selected sequences from a fasta file against nr and creates a graphical summary accessible via an html file. This should work in a couple of different modes: (1) by selecting sequences completely at random from fasta or fastq (so it would work prior to demultiplexing), and (2) by selecting n sequences from all samples in a file after demultiplexing.

This script should work by BLASTing against NCBI (use cogent.db.ncbi.EUtils). This will require a network connection, so the code should fail gracefully if there is not a network connection. This is preferential to requiring that the user always have a recent version of nr installed locally. For subsampling, qiime.util.subsample_fasta will be helpful, and it would be great to expand that function to support mode (2).

After creating this function, we'd likely hooking mode (2) up to core_diversity_analyses.py, but skipping if there is not an active internet connection.

gregcaporaso commented 11 years ago

Also, this would be great a thing to work on for someone who is new to QIIME development, and looking for a challenge that is bigger than something tagged as "quick fix", but still relatively stand-alone.

gregcaporaso commented 10 years ago

Anyone interested in working on this for QIIME 1.8.0?

antgonza commented 10 years ago

I think this script can be classified as an easy intro to QIIME development, in case someone is interested, perhaps: @cuttlefishh @JWDebelius @amnona

We should also allow for % of the sequences from the input file.