Shamir-Lab / SCAPP

SCAPP is a plasmid assembly tool. This tool is described in our paper: https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-021-01068-z
MIT License
29 stars 6 forks source link

SCAPP exit gracefully #26

Open pbelmann opened 2 years ago

pbelmann commented 2 years ago

Thank you for developing this tool!

Makeblastdb seems to fail because no plasmids could be detected (see full log below). The reason is that ./intermediate_files/ERR4163138_contigs.cycs.fasta is empty.

I believe that scapp should exit gracefully if no plasmids could be found.

Full log:

Getting scores of graph nodes
Getting scores of graph nodes
Getting scores of graph nodes
Getting scores of graph nodes
Getting scores of graph nodes
Finding plasmid-specific genes with BLAST
No blast db provided, creating one
Running command: makeblastdb  -in ERR4163138_contigs.fastg -blastdb_version 4 -dbtype nucl -out ./intermediate_files/ERR4163138_contigs.fastg.blastdb
Running blast search for gene (nt) sequences in blast db
Running command: blastn -task megablast -db ./intermediate_files/ERR4163138_contigs.fastg.blastdb -query /usr/local/lib/python3.7/site-packages/scapp/data/nt/nt1 -out ./intermediate_files/nt1_blastdb.out -num_threads 14 -outfmt "6 qseqid sseqid length pident qlen slen evalue"
Running command: blastn -task megablast -db ./intermediate_files/ERR4163138_contigs.fastg.blastdb -query /usr/local/lib/python3.7/site-packages/scapp/data/nt/nt3 -out ./intermediate_files/nt3_blastdb.out -num_threads 14 -outfmt "6 qseqid sseqid length pident qlen slen evalue"
Running command: blastn -task megablast -db ./intermediate_files/ERR4163138_contigs.fastg.blastdb -query /usr/local/lib/python3.7/site-packages/scapp/data/nt/nt2 -out ./intermediate_files/nt2_blastdb.out -num_threads 14 -outfmt "6 qseqid sseqid length pident qlen slen evalue"
Running blast search for protein (aa) sequences in blast db
Running command: tblastn -db ./intermediate_files/ERR4163138_contigs.fastg.blastdb -db_gencode 11 -query /usr/local/lib/python3.7/site-packages/scapp/data/aa/aa1 -out ./intermediate_files/aa1_blastdb.out -num_threads 4 -outfmt "6 qseqid sseqid length pident qlen slen evalue"
Running command: tblastn -db ./intermediate_files/ERR4163138_contigs.fastg.blastdb -db_gencode 11 -query /usr/local/lib/python3.7/site-packages/scapp/data/aa/aa2 -out ./intermediate_files/aa2_blastdb.out -num_threads 4 -outfmt "6 qseqid sseqid length pident qlen slen evalue"
0 contigs hit
Writing list of hit contigs in: ./intermediate_files/hit_seqs.out
Removing intermediate files...
Starting SCAPP plasmid finding
================== Added paths ====================
Filtering plasmids by plasmid-specific genes
No blast db provided, creating one
Running command: makeblastdb  -in ./intermediate_files/ERR4163138_contigs.cycs.fasta -blastdb_version 4 -dbtype nucl -out ./intermediate_files/hit_cycs/ERR4163138_contigs.cycs.fasta.blastdb
Error creating blast database. Check path to makeblastdb executable.
Error filtering by plasmid genes. Check BLAST output file (blast_std.log)
Traceback (most recent call last):
  File "/usr/local/bin/scapp", line 10, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/site-packages/scapp/scapp.py", line 377, in main
    protfiles_path, None, ncbi_path, num_procs, PARAMS.GENE_MATCH_THRESH)
  File "/usr/local/lib/python3.7/site-packages/scapp/find_plasmid_gene_matches.py", line 146, in find_plasmid_gene_matches
    dbpath = create_db(infile, ncbi_bin, outdir)
  File "/usr/local/lib/python3.7/site-packages/scapp/find_plasmid_gene_matches.py", line 66, in create_db
    subprocess.check_call(command, shell=True)
  File "/usr/local/lib/python3.7/subprocess.py", line 363, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'makeblastdb  -in ./intermediate_files/ERR4163138_contigs.cycs.fasta -blastdb_version 4 -dbtype nucl -out ./intermediate_files/hit_cycs/ERR4163138_contigs.cycs.fasta.blastdb' returned non-zero exit status 1.
dpellow commented 2 years ago

This is a good point. In most cases with a decent sized metagenome there should be cycles present in the assembly graph, so this behaviour should be rarely seen. It may indicate that the input doesn't match the tools expectations (e.g. not really an assembly graph in proper fastg format).

We will add a fix for this to the next minor release of SCAPP which will hopefully be in the next few weeks.