bluegenes / MakeMyTranscriptome

assemble, annotate, and assess transcriptomes in a single step
Other
9 stars 4 forks source link

diamond integration: post-diamond blast processing #27

Closed bluegenes closed 8 years ago

bluegenes commented 8 years ago
  1. in manage_databases:
    • use fastaID2Names to get a tabular lookup dictionary for each fasta
  2. In annotator:
    • addStitleToBlastTab.py
      • takes in output of fastaID2Names + diamond blast output; modifies blast output in-place
bluegenes commented 8 years ago

fastaID2Names usage::

python fastaID2Names.py --fasta FASTA.fasta

output: FASTA.id2names

addStitleToBlastTab.py usage::

python addStitleToBlastTab.py --db2Name FASTA.id2names --blast BLAST
bluegenes commented 8 years ago

I had a go at writing the task functions for these.

Database Task:

def build_fasta_tabdb_task(fasta, out_path, tasks, log_flag=True):
    trgs = ['out_path+ {0!s}.split(".fa")[0] + ".id2names"'.format(fasta)]
    cmd = '{cd {0!s}; python {1!s}/fastaID2names.py --fasta {2!s}'.format(PATH_DATABASES, PATH_SCRIPTS, fasta)
    name = 'build_fasta_tabdb_'+os.path.basename(fasta)
    out, err = GEN_LOGS(name) if(log_flag) else (None, None)
    return Task(command=cmd, dependencies=tasks, targets=trgs, name=name, stdout=out, stderr=err)

Quality Task:

def extend_blast_output_task(fasta_tabdb, out_path, blast, tasks, log_flag=True):
    trgs = [] #blast file already exists --> can't use targets to check if this task is done, unless we *don't* modify in-place
    cmd = '{python {0!s}/addStitleToBlastTab.py --db2Name {1!s}/{2!s} --blast {3!s}/{4!s}'.format(PATH_SCRIPTS, PATH_DATABASES, fasta_tabdb, out_path, blast)
    name = 'extend_blast_output_'+os.path.basename(blast)
    out, err = GEN_LOGS(name) if(log_flag) else (None, None)
    return Task(command=cmd, dependencies=tasks, targets=trgs, name=name, stdout=out, stderr=err)