dib-lab / dammit

just annotate it, dammit!
http://dib-lab.github.io/dammit/
Other
88 stars 28 forks source link

TaskError - taskid:remap_hmmer:longest_orfs.pep.x.pfam.tbl: UnboundLocalError: local variable 'q' referenced before assignment #133

Open johnsolk opened 5 years ago

johnsolk commented 5 years ago
# dammit
## a tool for easy de novo transcriptome annotation

by Camille Scott

**v1.0rc2**, 2018

## submodule: annotate
### Database Check
#### Info
* Database Directory: /pylon5/bi5fpmp/ljcohen/dammit
* Doit Database: /pylon5/bi5fpmp/ljcohen/dammit/databases.doit.db

*All database tasks up-to-date.*

### Annotation
#### Info
* Doit Database: /local/4641169/F_notti/F_notti.dammit/annotate.doit.db
* Input Transcriptome: /local/4641169/F_notti/F_notti.dammit/F_notti.trinity_out.Trinity.fasta

Some tasks out of date!

Out-of-date tasks:
* BUSCO-eukaryota
* TransDecoder.LongOrfs
* TransDecoder.Predict
* annotate:fasta
* cmscan:Rfam
* gff3:/local/4641169/F_notti/F_notti.dammit/F_notti.trinity_out.Trinity.fasta.x.Fundulus_heteroclitus.Fundulus_heteroclitus-3.0.2.pep.all.fa.crbl.csv
* gff3:/local/4641169/F_notti/F_notti.dammit/F_notti.trinity_out.Trinity.fasta.x.kfish2rae5g.pub.aa.crbl.csv
* gff3:/local/4641169/F_notti/F_notti.dammit/F_notti.trinity_out.Trinity.fasta.x.protein.fa.crbl.csv
* gff3:OrthoDB
* gff3:Pfam-A
* gff3:Rfam
* gff3:merge-all
* gff3:sprot
* hmmscan:Pfam-A
* hmmscan:Pfam-A:remap
* lastal:OrthoDB
* lastal:best-hits:OrthoDB
* lastal:best-hits:sprot
* lastal:sprot
* rename-transcriptome
* transcriptome-stats
* user-database:Fundulus_heteroclitus.Fundulus_heteroclitus-3.0.2.pep.all.fa-shmlast-fit_and_filter_crbl_hits
* user-database:Fundulus_heteroclitus.Fundulus_heteroclitus-3.0.2.pep.all.fa-shmlast-lastal:.F_notti.trinity_out.Trinity.fasta.pep.x.Fundulus_heteroclitus.Fundulus_heteroclitus-3.0.2.pep.all.fa.maf
* user-database:Fundulus_heteroclitus.Fundulus_heteroclitus-3.0.2.pep.all.fa-shmlast-lastal:.Fundulus_heteroclitus.Fundulus_heteroclitus-3.0.2.pep.all.fa.x.F_notti.trinity_out.Trinity.fasta.pep.maf
* user-database:Fundulus_heteroclitus.Fundulus_heteroclitus-3.0.2.pep.all.fa-shmlast-lastdb:.Fundulus_heteroclitus.Fundulus_heteroclitus-3.0.2.pep.all.fa
* user-database:Fundulus_heteroclitus.Fundulus_heteroclitus-3.0.2.pep.all.fa-shmlast-rename:/local/4641169/F_notti/Fhet_reference_genome/ensembl/Fundulus_heteroclitus.Fundulus_heteroclitus-3.0.2.pep.all.fa
* user-database:kfish2rae5g.pub.aa-shmlast-fit_and_filter_crbl_hits
* user-database:kfish2rae5g.pub.aa-shmlast-lastal:.F_notti.trinity_out.Trinity.fasta.pep.x.kfish2rae5g.pub.aa.maf
* user-database:kfish2rae5g.pub.aa-shmlast-lastal:.kfish2rae5g.pub.aa.x.F_notti.trinity_out.Trinity.fasta.pep.maf
* user-database:kfish2rae5g.pub.aa-shmlast-lastdb:.kfish2rae5g.pub.aa
* user-database:kfish2rae5g.pub.aa-shmlast-rename:/local/4641169/F_notti/Fhet_reference_genome/evigene/kfish2rae5g.pub.aa
* user-database:protein.fa-shmlast-fit_and_filter_crbl_hits
* user-database:protein.fa-shmlast-lastal:.F_notti.trinity_out.Trinity.fasta.pep.x.protein.fa.maf
* user-database:protein.fa-shmlast-lastal:.protein.fa.x.F_notti.trinity_out.Trinity.fasta.pep.maf
* user-database:protein.fa-shmlast-lastdb:.F_notti.trinity_out.Trinity.fasta.pep
* user-database:protein.fa-shmlast-lastdb:.protein.fa
* user-database:protein.fa-shmlast-rename:/local/4641169/F_notti/F_notti.dammit/F_notti.trinity_out.Trinity.fasta
* user-database:protein.fa-shmlast-rename:/local/4641169/F_notti/Fhet_reference_genome/ncbi/protein.fa
* user-database:protein.fa-shmlast-translate:.F_notti.trinity_out.Trinity.fasta

#### Run Tasks
- [ ] F_notti.trinity_out.Trinity.fasta: 
    * Python: function get_rename_transcriptome_task.fix
- [ ] transcriptome_stats:F_notti.trinity_out.Trinity.fasta: 
    * Python: function get_transcriptome_stats_task.cmd
- [ ] busco:F_notti.trinity_out.Trinity.fasta-eukaryota_odb9: 
    * Cmd: `python3 /pylon5/bi5fpmp/ljcohen/miniconda3/envs/dammit_master/bin/run_BUSCO.py -i /local/4641169/F_notti/F_notti.dammit/F_notti.trinity_out.Trinity.fasta -f -o F_notti.trinity_out.Trinity.fasta.eukaryota.busco.results -l /pylon5/bi5fpmp/ljcohen/dammit/busco2db/eukaryota_odb9 -m tran -c 14`
- [ ] TransDecoder.LongOrfs:F_notti.trinity_out.Trinity.fasta: 
    * Cmd: `/pylon5/bi5fpmp/ljcohen/miniconda3/envs/dammit_master/bin/TransDecoder.LongOrfs -t /local/4641169/F_notti/F_notti.dammit/F_notti.trinity_out.Trinity.fasta -m 80`
- [ ] hmmscan:longest_orfs.pep.x.Pfam-A.hmm: 
    * Cmd: `cat /local/4641169/F_notti/F_notti.dammit/F_notti.trinity_out.Trinity.fasta.transdecoder_dir/longest_orfs.pep | /pylon5/bi5fpmp/ljcohen/miniconda3/envs/dammit_master/bin/parallel --block `expr $(wc -c /local/4641169/F_notti/F_notti.dammit/F_notti.trinity_out.Trinity.fasta.transdecoder_dir/longest_orfs.pep | awk '{print $1}') / 14` --round-robin --pipe --recstart '>' --gnu -j 14 /pylon5/bi5fpmp/ljcohen/miniconda3/envs/dammit_master/bin/hmmscan --cpu 1 --domtblout /dev/stdout -E 1e-05 -o /local/4641169/F_notti/F_notti.dammit/F_notti.trinity_out.Trinity.fasta.transdecoder_dir/longest_orfs.pep.x.pfam.tbl.hmmscan.out /pylon5/bi5fpmp/ljcohen/dammit/Pfam-A.hmm /dev/stdin > /local/4641169/F_notti/F_notti.dammit/F_notti.trinity_out.Trinity.fasta.transdecoder_dir/longest_orfs.pep.x.pfam.tbl`
- [ ] remap_hmmer:longest_orfs.pep.x.pfam.tbl: 
    * Python: function get_remap_hmmer_task.cmd
TaskError - taskid:remap_hmmer:longest_orfs.pep.x.pfam.tbl
PythonAction Error
Traceback (most recent call last):
  File "/pylon5/bi5fpmp/ljcohen/miniconda3/envs/dammit_master/lib/python3.6/site-packages/doit/action.py", line 424, in execute
    returned_value = self.py_callable(*self.args, **kwargs)
  File "/pylon5/bi5fpmp/ljcohen/miniconda3/envs/dammit_master/lib/python3.6/site-packages/dammit-1.0rc2-py3.6.egg/dammit/tasks/hmmer.py", line 142, in cmd
    query_basename=transcript_basename).read()
  File "/pylon5/bi5fpmp/ljcohen/miniconda3/envs/dammit_master/lib/python3.6/site-packages/dammit-1.0rc2-py3.6.egg/dammit/fileio/base.py", line 79, in read
    return pd.concat(self, ignore_index=True)
  File "/pylon5/bi5fpmp/ljcohen/miniconda3/envs/dammit_master/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 225, in concat
    copy=copy, sort=sort)
  File "/pylon5/bi5fpmp/ljcohen/miniconda3/envs/dammit_master/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 256, in __init__
    objs = list(objs)
  File "/pylon5/bi5fpmp/ljcohen/miniconda3/envs/dammit_master/lib/python3.6/site-packages/dammit-1.0rc2-py3.6.egg/dammit/fileio/hmmer.py", line 73, in __iter__
    yield self._build_df(data)
  File "/pylon5/bi5fpmp/ljcohen/miniconda3/envs/dammit_master/lib/python3.6/site-packages/dammit-1.0rc2-py3.6.egg/dammit/fileio/hmmer.py", line 100, in _build_df
    df['query_name'] = df.query_name.apply(split_query)
  File "/pylon5/bi5fpmp/ljcohen/miniconda3/envs/dammit_master/lib/python3.6/site-packages/pandas/core/series.py", line 3194, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas/_libs/src/inference.pyx", line 1472, in pandas._libs.lib.map_infer
  File "/pylon5/bi5fpmp/ljcohen/miniconda3/envs/dammit_master/lib/python3.6/site-packages/dammit-1.0rc2-py3.6.egg/dammit/fileio/hmmer.py", line 95, in split_query
    return q
UnboundLocalError: local variable 'q' referenced before assignment

########################################
TaskError - taskid:remap_hmmer:longest_orfs.pep.x.pfam.tbl
remap_hmmer:longest_orfs.pep.x.pfam.tbl <stderr>:
/pylon5/bi5fpmp/ljcohen/miniconda3/envs/dammit_master/lib/python3.6/site-packages/dammit-1.0rc2-py3.6.egg/dammit/fileio/gff3.py:73: ParserWarning: Both a converter and dtype were specified for column attributes - only the converter will be used
  dtype=dict(self.columns)):

remap_hmmer:longest_orfs.pep.x.pfam.tbl <stdout>:
johnsolk commented 5 years ago

This still seems to be happening - Installed from master (pull on 12/20/2018) from dib-lab/dammit on bridges hpc. Ran with 3 custom aa db. Is this something that I am doing wrong?

[ljcohen@br018 sbatch_files]$ cat dammit_F_notti-4657468.o
# dammit
## a tool for easy de novo transcriptome annotation

by Camille Scott

**v1.0rc2**, 2018

## submodule: annotate
### Database Check
#### Info
* Database Directory: /pylon5/bi5fpmp/ljcohen/dammit
* Doit Database: /pylon5/bi5fpmp/ljcohen/dammit/databases.doit.db

*All database tasks up-to-date.*

### Annotation
#### Info
* Doit Database: /local/4657468/F_notti/F_notti.dammit/annotate.doit.db
* Input Transcriptome: /local/4657468/F_notti/F_notti.dammit/F_notti.trinity_out.Trinity.fasta

Some tasks out of date!

Out-of-date tasks:
* BUSCO-eukaryota
* TransDecoder.LongOrfs
* TransDecoder.Predict
* annotate:fasta
* cmscan:Rfam
* gff3:/local/4657468/F_notti/F_notti.dammit/F_notti.trinity_out.Trinity.fasta.x.Fundulus_heteroclitus.Fundulus_heteroclitus-3.0.2.pep.all.fa.crbl.csv
* gff3:/local/4657468/F_notti/F_notti.dammit/F_notti.trinity_out.Trinity.fasta.x.kfish2rae5g.pub.aa.crbl.csv
* gff3:/local/4657468/F_notti/F_notti.dammit/F_notti.trinity_out.Trinity.fasta.x.protein.fa.crbl.csv
* gff3:OrthoDB
* gff3:Pfam-A
* gff3:Rfam
* gff3:merge-all
* gff3:sprot
* hmmscan:Pfam-A
* hmmscan:Pfam-A:remap
* lastal:OrthoDB
* lastal:best-hits:OrthoDB
* lastal:best-hits:sprot
* lastal:sprot
* rename-transcriptome
* transcriptome-stats
* user-database:Fundulus_heteroclitus.Fundulus_heteroclitus-3.0.2.pep.all.fa-shmlast-fit_and_filter_crbl_hits
* user-database:Fundulus_heteroclitus.Fundulus_heteroclitus-3.0.2.pep.all.fa-shmlast-lastal:.F_notti.trinity_out.Trinity.fasta.pep.x.Fundulus_heteroclitus.Fundulus_heteroclitus-3.0.2.pep.all.fa.maf
* user-database:Fundulus_heteroclitus.Fundulus_heteroclitus-3.0.2.pep.all.fa-shmlast-lastal:.Fundulus_heteroclitus.Fundulus_heteroclitus-3.0.2.pep.all.fa.x.F_notti.trinity_out.Trinity.fasta.pep.maf
* user-database:Fundulus_heteroclitus.Fundulus_heteroclitus-3.0.2.pep.all.fa-shmlast-lastdb:.Fundulus_heteroclitus.Fundulus_heteroclitus-3.0.2.pep.all.fa
* user-database:Fundulus_heteroclitus.Fundulus_heteroclitus-3.0.2.pep.all.fa-shmlast-rename:/local/4657468/F_notti/Fhet_reference_genome/ensembl/Fundulus_heteroclitus.Fundulus_heteroclitus-3.0.2.pep.all.fa
* user-database:kfish2rae5g.pub.aa-shmlast-fit_and_filter_crbl_hits
* user-database:kfish2rae5g.pub.aa-shmlast-lastal:.F_notti.trinity_out.Trinity.fasta.pep.x.kfish2rae5g.pub.aa.maf
* user-database:kfish2rae5g.pub.aa-shmlast-lastal:.kfish2rae5g.pub.aa.x.F_notti.trinity_out.Trinity.fasta.pep.maf
* user-database:kfish2rae5g.pub.aa-shmlast-lastdb:.kfish2rae5g.pub.aa
* user-database:kfish2rae5g.pub.aa-shmlast-rename:/local/4657468/F_notti/Fhet_reference_genome/evigene/kfish2rae5g.pub.aa
* user-database:protein.fa-shmlast-fit_and_filter_crbl_hits
* user-database:protein.fa-shmlast-lastal:.F_notti.trinity_out.Trinity.fasta.pep.x.protein.fa.maf
* user-database:protein.fa-shmlast-lastal:.protein.fa.x.F_notti.trinity_out.Trinity.fasta.pep.maf
* user-database:protein.fa-shmlast-lastdb:.F_notti.trinity_out.Trinity.fasta.pep
* user-database:protein.fa-shmlast-lastdb:.protein.fa
* user-database:protein.fa-shmlast-rename:/local/4657468/F_notti/F_notti.dammit/F_notti.trinity_out.Trinity.fasta
* user-database:protein.fa-shmlast-rename:/local/4657468/F_notti/Fhet_reference_genome/ncbi/protein.fa
* user-database:protein.fa-shmlast-translate:.F_notti.trinity_out.Trinity.fasta

#### Run Tasks
- [ ] F_notti.trinity_out.Trinity.fasta: 
    * Python: function get_rename_transcriptome_task.fix
- [ ] transcriptome_stats:F_notti.trinity_out.Trinity.fasta: 
    * Python: function get_transcriptome_stats_task.cmd
- [ ] busco:F_notti.trinity_out.Trinity.fasta-eukaryota_odb9: 
    * Cmd: `python3 /pylon5/bi5fpmp/ljcohen/miniconda3/envs/dammit_master/bin/run_BUSCO.py -i /local/4657468/F_notti/F_notti.dammit/F_notti.trinity_out.Trinity.fasta -f -o F_notti.trinity_out.Trinity.fasta.eukaryota.busco.results -l /pylon5/bi5fpmp/ljcohen/dammit/busco2db/eukaryota_odb9 -m tran -c 14`
- [ ] TransDecoder.LongOrfs:F_notti.trinity_out.Trinity.fasta: 
    * Cmd: `/pylon5/bi5fpmp/ljcohen/miniconda3/envs/dammit_master/bin/TransDecoder.LongOrfs -t /local/4657468/F_notti/F_notti.dammit/F_notti.trinity_out.Trinity.fasta -m 80`
- [ ] hmmscan:longest_orfs.pep.x.Pfam-A.hmm: 
    * Cmd: `cat /local/4657468/F_notti/F_notti.dammit/F_notti.trinity_out.Trinity.fasta.transdecoder_dir/longest_orfs.pep | /pylon5/bi5fpmp/ljcohen/miniconda3/envs/dammit_master/bin/parallel --block `expr $(wc -c /local/4657468/F_notti/F_notti.dammit/F_notti.trinity_out.Trinity.fasta.transdecoder_dir/longest_orfs.pep | awk '{print $1}') / 14` --round-robin --pipe --recstart '>' --gnu -j 14 /pylon5/bi5fpmp/ljcohen/miniconda3/envs/dammit_master/bin/hmmscan --cpu 1 --domtblout /dev/stdout -E 1e-05 -o /local/4657468/F_notti/F_notti.dammit/F_notti.trinity_out.Trinity.fasta.transdecoder_dir/longest_orfs.pep.x.pfam.tbl.hmmscan.out /pylon5/bi5fpmp/ljcohen/dammit/Pfam-A.hmm /dev/stdin > /local/4657468/F_notti/F_notti.dammit/F_notti.trinity_out.Trinity.fasta.transdecoder_dir/longest_orfs.pep.x.pfam.tbl`
- [ ] remap_hmmer:longest_orfs.pep.x.pfam.tbl: 
    * Python: function get_remap_hmmer_task.cmd
TaskError - taskid:remap_hmmer:longest_orfs.pep.x.pfam.tbl
PythonAction Error
Traceback (most recent call last):
  File "/pylon5/bi5fpmp/ljcohen/miniconda3/envs/dammit_master/lib/python3.6/site-packages/doit/action.py", line 424, in execute
    returned_value = self.py_callable(*self.args, **kwargs)
  File "/pylon5/bi5fpmp/ljcohen/miniconda3/envs/dammit_master/lib/python3.6/site-packages/dammit-1.0rc2-py3.6.egg/dammit/tasks/hmmer.py", line 142, in cmd
    query_basename=transcript_basename).read()
  File "/pylon5/bi5fpmp/ljcohen/miniconda3/envs/dammit_master/lib/python3.6/site-packages/dammit-1.0rc2-py3.6.egg/dammit/fileio/base.py", line 79, in read
    return pd.concat(self, ignore_index=True)
  File "/pylon5/bi5fpmp/ljcohen/miniconda3/envs/dammit_master/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 225, in concat
    copy=copy, sort=sort)
  File "/pylon5/bi5fpmp/ljcohen/miniconda3/envs/dammit_master/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 256, in __init__
    objs = list(objs)
  File "/pylon5/bi5fpmp/ljcohen/miniconda3/envs/dammit_master/lib/python3.6/site-packages/dammit-1.0rc2-py3.6.egg/dammit/fileio/hmmer.py", line 73, in __iter__
    yield self._build_df(data)
  File "/pylon5/bi5fpmp/ljcohen/miniconda3/envs/dammit_master/lib/python3.6/site-packages/dammit-1.0rc2-py3.6.egg/dammit/fileio/hmmer.py", line 100, in _build_df
    df['query_name'] = df.query_name.apply(split_query)
  File "/pylon5/bi5fpmp/ljcohen/miniconda3/envs/dammit_master/lib/python3.6/site-packages/pandas/core/series.py", line 3194, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas/_libs/src/inference.pyx", line 1472, in pandas._libs.lib.map_infer
  File "/pylon5/bi5fpmp/ljcohen/miniconda3/envs/dammit_master/lib/python3.6/site-packages/dammit-1.0rc2-py3.6.egg/dammit/fileio/hmmer.py", line 95, in split_query
    return q
UnboundLocalError: local variable 'q' referenced before assignment

########################################
TaskError - taskid:remap_hmmer:longest_orfs.pep.x.pfam.tbl
remap_hmmer:longest_orfs.pep.x.pfam.tbl <stderr>:
/pylon5/bi5fpmp/ljcohen/miniconda3/envs/dammit_master/lib/python3.6/site-packages/dammit-1.0rc2-py3.6.egg/dammit/fileio/gff3.py:73: ParserWarning: Both a converter and dtype were specified for column attributes - only the converter will be used
  dtype=dict(self.columns)):

remap_hmmer:longest_orfs.pep.x.pfam.tbl <stdout>:
johnsolk commented 5 years ago

Here's the command script:

#!/bin/bash -l
#SBATCH -D /pylon5/bi5fpmp/ljcohen/kfish_dammit/sbatch_files/
#SBATCH -J dammit_F_notti
#SBATCH -o /pylon5/bi5fpmp/ljcohen/kfish_dammit/sbatch_files/dammit_F_notti-%j.o
#SBATCH -e /pylon5/bi5fpmp/ljcohen/kfish_dammit/sbatch_files/dammit_F_notti-%j.o
#SBATCH -t 60:00:00
#SBATCH -p LM
#SBATCH --ntasks-per-node 14
#SBATCH --cpus-per-task 2
#SBATCH --mem=1000GB

source /home/ljcohen/.bashrc
source activate dammit_master
export DAMMIT_DB_DIR=/pylon5/bi5fpmp/ljcohen/dammit
SPECIES=F_notti
PROJECTDIR=$LOCAL/$SPECIES
mkdir $PROJECTDIR
cd $PROJECTDIR
cp /pylon5/bi5fpmp/ljcohen/kfish_trinity/F_notti.trinity_out.Trinity.fasta .
cp -r /pylon5/bi5fpmp/ljcohen/Fhet_reference_genome/ .
dammit annotate F_notti.trinity_out.Trinity.fasta --busco-group eukaryota \
     --user-databases Fhet_reference_genome/ncbi/protein.fa Fhet_reference_genome/evigene/kfish2rae5g.pub.aa Fhet_reference_genome/ensembl/Fundulus_heteroclitus.Fundulus_heteroclitus-3.0.2.pep.all.fa \
     --output-dir F_notti.dammit \
     --n_threads 14
cp -r F_notti.dammit /pylon5/bi5fpmp/ljcohen/kfish_dammit/
cd ..
rm -rf $PROJECTDIR