gbouras13 / dnaapler

Reorients assembled microbial sequences
MIT License
95 stars 3 forks source link

dnaapler all --autocomplete not running #74

Closed schorlton-bugseq closed 2 months ago

schorlton-bugseq commented 4 months ago

Thanks for the great tool! I tried running dnaapler 0.7.0 all with --autocomplete. Example:

echo -e ">Sequence1\nATCGA" > test.fasta
dnaapler all --autocomplete mystery -i test.fasta

and I'm getting:

2024-05-16 23:29:10.896 | INFO     | dnaapler.utils.validation:instantiate_dirs:23 - Checking the output directory output.dnaapler
2024-05-16 23:29:10.901 | INFO     | dnaapler.utils.util:begin_dnaapler:71 - You are using dnaapler version 0.7.0
2024-05-16 23:29:10.901 | INFO     | dnaapler.utils.util:begin_dnaapler:72 - Repository homepage is https://github.com/gbouras13/dnaapler
2024-05-16 23:29:10.901 | INFO     | dnaapler.utils.util:begin_dnaapler:73 - Written by George Bouras: george.bouras@adelaide.edu.au
2024-05-16 23:29:10.902 | INFO     | dnaapler.utils.util:begin_dnaapler:74 - Your input FASTA is test.fasta
2024-05-16 23:29:10.902 | INFO     | dnaapler.utils.util:begin_dnaapler:75 - Your output directory  is output.dnaapler
2024-05-16 23:29:10.902 | INFO     | dnaapler.utils.util:begin_dnaapler:76 - You have specified 1 threads to use with blastx
2024-05-16 23:29:10.902 | INFO     | dnaapler.utils.util:begin_dnaapler:77 - You have specified all gene(s) to reorient your sequence
2024-05-16 23:29:10.902 | INFO     | dnaapler.utils.util:check_blast_version:115 - Checking BLAST installation.
2024-05-16 23:29:10.968 | INFO     | dnaapler.utils.util:check_blast_version:135 - BLAST version found is v2.15.0.
2024-05-16 23:29:10.969 | INFO     | dnaapler.utils.util:check_blast_version:145 - BLAST version is ok.
2024-05-16 23:29:10.969 | INFO     | dnaapler.utils.util:check_pyrodigal_version:90 - Checking pyrodigal installation.
2024-05-16 23:29:10.969 | INFO     | dnaapler.utils.util:check_pyrodigal_version:101 - Pyrodigal version is v3.3.0
2024-05-16 23:29:10.969 | INFO     | dnaapler.utils.util:check_pyrodigal_version:102 - Pyrodigal version is ok.
2024-05-16 23:29:10.969 | INFO     | dnaapler.utils.validation:validate_fasta_all:100 - Checking that the input file test.fasta is in FASTA format and has at least 1 entry.
2024-05-16 23:29:10.975 | INFO     | dnaapler.utils.validation:validate_fasta_all:107 - test.fasta file checked.
2024-05-16 23:29:10.975 | INFO     | dnaapler.utils.validation:validate_fasta_all:116 - test.fasta has only one entry.
2024-05-16 23:29:10.975 | INFO     | dnaapler.utils.validation:check_evalue:187 - You have specified an evalue of 1e-10.
2024-05-16 23:29:10.976 | INFO     | dnaapler.utils.external_tools:run:53 - Started running blastx -db /opt/conda/envs/test/lib/python3.10/site-packages/dnaapler/db/all_db -evalue 1e-10 -num_threads 1 -outfmt ' 6 qseqid qlen sseqid slen length qstart qend sstart send pident nident gaps mismatch evalue bitscore qseq sseq ' -out output.dnaapler/dnaapler_blast_output.txt -query test.fasta ...
2024-05-16 23:29:11.052 | INFO     | dnaapler.utils.external_tools:run:55 - Done running blastx -db /opt/conda/envs/test/lib/python3.10/site-packages/dnaapler/db/all_db -evalue 1e-10 -num_threads 1 -outfmt ' 6 qseqid qlen sseqid slen length qstart qend sstart send pident nident gaps mismatch evalue bitscore qseq sseq ' -out output.dnaapler/dnaapler_blast_output.txt -query test.fasta
2024-05-16 23:29:11.056 | ERROR    | dnaapler.utils.all:all_process_blast_output_and_reorient:69 - There were 0 BLAST hits. Please check your input file or try dnaapler custom. If you have assembled an understudied species, this may also be the cause.

However, I understood from the docs that autocomplete mode should be triggered in this case? If so, I think it may be because this line is throwing an error before it can go on to autocomplete? https://github.com/gbouras13/dnaapler/blob/v0.7.0/src/dnaapler/utils/all.py#L68

Let me know if I'm doing anything incorrectly and thanks for your consideration!

gbouras13 commented 3 months ago

Hi @schorlton-bugseq ,

Sorry for the delay - I'm fixing a few bugs over the next little while and will look into this :)

George

gbouras13 commented 2 months ago

Hi @schorlton-bugseq ,

I've added a fix to this now - your specific example above will still fail later in the pipeline (as it is 5 nucleotides so reorientation makes no sense!) but in general, if your contig has no BLAST hits, autocomplete will proceeed now (assuming at least 4 CDS).

George

schorlton-bugseq commented 2 months ago

Awesome - thank you! Will wait for a release and feel free to close this issue when appropriate.

schorlton-bugseq commented 2 months ago

@gbouras13 - thanks again for the fix. Is it possible to also enable a successful analysis even if <4 CDS were found on a contig? I think it is very similar to https://github.com/gbouras13/dnaapler/issues/77. I want any contigs which can be reoriented to be reoriented, and ignore those which fail by either BLAST or autocomplete mode. If I understand correctly, one small contig may cause an entire analysis to fail reorientation of the other contigs. My alternative is to call CDSs before calling dnaapler and filter contigs with <4 CDS, but that seems redundant. Thanks again for your consideration!