gbouras13 / dnaapler

Reorients assembled microbial sequences
MIT License
95 stars 3 forks source link

dnaapler all --autocomplete not running #74

Closed schorlton-bugseq closed 2 months ago

schorlton-bugseq commented 4 months ago

Thanks for the great tool! I tried running dnaapler 0.7.0 all with --autocomplete. Example:

echo -e ">Sequence1\nATCGA" > test.fasta
dnaapler all --autocomplete mystery -i test.fasta

and I'm getting:

2024-05-16 23:29:10.896 | INFO     | dnaapler.utils.validation:instantiate_dirs:23 - Checking the output directory output.dnaapler
2024-05-16 23:29:10.901 | INFO     | dnaapler.utils.util:begin_dnaapler:71 - You are using dnaapler version 0.7.0
2024-05-16 23:29:10.901 | INFO     | dnaapler.utils.util:begin_dnaapler:72 - Repository homepage is
2024-05-16 23:29:10.901 | INFO     | dnaapler.utils.util:begin_dnaapler:73 - Written by George Bouras:
2024-05-16 23:29:10.902 | INFO     | dnaapler.utils.util:begin_dnaapler:74 - Your input FASTA is test.fasta
2024-05-16 23:29:10.902 | INFO     | dnaapler.utils.util:begin_dnaapler:75 - Your output directory  is output.dnaapler
2024-05-16 23:29:10.902 | INFO     | dnaapler.utils.util:begin_dnaapler:76 - You have specified 1 threads to use with blastx
2024-05-16 23:29:10.902 | INFO     | dnaapler.utils.util:begin_dnaapler:77 - You have specified all gene(s) to reorient your sequence
2024-05-16 23:29:10.902 | INFO     | dnaapler.utils.util:check_blast_version:115 - Checking BLAST installation.
2024-05-16 23:29:10.968 | INFO     | dnaapler.utils.util:check_blast_version:135 - BLAST version found is v2.15.0.
2024-05-16 23:29:10.969 | INFO     | dnaapler.utils.util:check_blast_version:145 - BLAST version is ok.
2024-05-16 23:29:10.969 | INFO     | dnaapler.utils.util:check_pyrodigal_version:90 - Checking pyrodigal installation.
2024-05-16 23:29:10.969 | INFO     | dnaapler.utils.util:check_pyrodigal_version:101 - Pyrodigal version is v3.3.0
2024-05-16 23:29:10.969 | INFO     | dnaapler.utils.util:check_pyrodigal_version:102 - Pyrodigal version is ok.
2024-05-16 23:29:10.969 | INFO     | dnaapler.utils.validation:validate_fasta_all:100 - Checking that the input file test.fasta is in FASTA format and has at least 1 entry.
2024-05-16 23:29:10.975 | INFO     | dnaapler.utils.validation:validate_fasta_all:107 - test.fasta file checked.
2024-05-16 23:29:10.975 | INFO     | dnaapler.utils.validation:validate_fasta_all:116 - test.fasta has only one entry.
2024-05-16 23:29:10.975 | INFO     | dnaapler.utils.validation:check_evalue:187 - You have specified an evalue of 1e-10.
2024-05-16 23:29:10.976 | INFO     | dnaapler.utils.external_tools:run:53 - Started running blastx -db /opt/conda/envs/test/lib/python3.10/site-packages/dnaapler/db/all_db -evalue 1e-10 -num_threads 1 -outfmt ' 6 qseqid qlen sseqid slen length qstart qend sstart send pident nident gaps mismatch evalue bitscore qseq sseq ' -out output.dnaapler/dnaapler_blast_output.txt -query test.fasta ...
2024-05-16 23:29:11.052 | INFO     | dnaapler.utils.external_tools:run:55 - Done running blastx -db /opt/conda/envs/test/lib/python3.10/site-packages/dnaapler/db/all_db -evalue 1e-10 -num_threads 1 -outfmt ' 6 qseqid qlen sseqid slen length qstart qend sstart send pident nident gaps mismatch evalue bitscore qseq sseq ' -out output.dnaapler/dnaapler_blast_output.txt -query test.fasta
2024-05-16 23:29:11.056 | ERROR    | dnaapler.utils.all:all_process_blast_output_and_reorient:69 - There were 0 BLAST hits. Please check your input file or try dnaapler custom. If you have assembled an understudied species, this may also be the cause.

However, I understood from the docs that autocomplete mode should be triggered in this case? If so, I think it may be because this line is throwing an error before it can go on to autocomplete?

Let me know if I'm doing anything incorrectly and thanks for your consideration!

gbouras13 commented 3 months ago

Hi @schorlton-bugseq ,

Sorry for the delay - I'm fixing a few bugs over the next little while and will look into this :)


gbouras13 commented 2 months ago

Hi @schorlton-bugseq ,

I've added a fix to this now - your specific example above will still fail later in the pipeline (as it is 5 nucleotides so reorientation makes no sense!) but in general, if your contig has no BLAST hits, autocomplete will proceeed now (assuming at least 4 CDS).


schorlton-bugseq commented 2 months ago

Awesome - thank you! Will wait for a release and feel free to close this issue when appropriate.

schorlton-bugseq commented 2 months ago

@gbouras13 - thanks again for the fix. Is it possible to also enable a successful analysis even if <4 CDS were found on a contig? I think it is very similar to I want any contigs which can be reoriented to be reoriented, and ignore those which fail by either BLAST or autocomplete mode. If I understand correctly, one small contig may cause an entire analysis to fail reorientation of the other contigs. My alternative is to call CDSs before calling dnaapler and filter contigs with <4 CDS, but that seems redundant. Thanks again for your consideration!