datasnakes / OrthoEvolution

An easy to use and comprehensive python package which aids in the analysis and visualization of orthologous genes. 🐵
https://orthoevolution.readthedocs.io/en/master/
29 stars 4 forks source link

New/Alternative Blast Workflow #143

Closed grabear closed 5 years ago

grabear commented 6 years ago

After discussing the package try the following:

sdhutchins commented 5 years ago

Update blast to version 2.8.1 while maintaining usage for blast executables less than 2.8.0

https://www.ncbi.nlm.nih.gov/books/NBK131777/#_Blast_ReleaseNotes_BLAST_2_8_0_March_28_

This is going to require changes to the blast databases we use as well.

It shouldn't be too difficult to add that in.

grabear commented 5 years ago

BLAST+ 2.9.0: April 1, 2019

sdhutchins commented 5 years ago

So support for the new blastn parameters was added.

  1. As far as seqidlists, this would - perhaps - require a good bit of reworking the blast. I'm trying to conceptualize it right now, but blast n would take a string (comma-delimited) of acessions for a taxonid. The output is what I'm not sure about. To my knowledge, all data would be stored to 1 xml file.

    With the taxonomy id stuff working, I almost feel like putting seqidlists in the icebox.

  2. It won't be hard to add support to the NCBIFtpClient for version5 of the blast databases. I'll probably do that this weekend and try to submit a PR by Monday or Tuesday.

Let me know if you have any thoughts on that, @grabear.

grabear commented 5 years ago

As far as seqidlists, this would - perhaps - require a good bit of reworking the blast.

You'll have to remind me about the seqidlist.

The output is what I'm not sure about. To my knowledge, all data would be stored to 1 xml file.

This is standard with our "current" setup too. BLASTN outputs an .xml file, and then we parse it and store the accession number in a file and sqlite database.

With the taxonomy id stuff working, I almost feel like putting seqidlists in the icebox.

I think we should deprecate any workflow with the old blast version by substituting the new taxon_id aware blast.

sdhutchins commented 5 years ago

As far as seqidlists, this would - perhaps - require a good bit of reworking the blast.

You'll have to remind me about the seqidlist.

The output is what I'm not sure about. To my knowledge, all data would be stored to 1 xml file.

This is standard with our "current" setup too. BLASTN outputs an .xml file, and then we parse it and store the accession number in a file and sqlite database.

With the taxonomy id stuff working, I almost feel like putting seqidlists in the icebox.

I think we should deprecate any workflow with the old blast version by substituting the new taxon_id aware blast.

Yes on deprecating for sure.

The -seqidlist filename parameter takes a file name.

Should be easy to:

  1. Use the existing human accessions list (self.blast_human)
  2. Write accessions to a temporary text file
  3. Use accessions list in blastn
sdhutchins commented 5 years ago

Seqid lists were not implemented due to a full reorganization of the BaseBlastN class being needed for such a change. That being said...using a taxid is a faster.