BigDataBiology / macrel

Predict AMPs in (meta)genomes and peptides
http://big-data-biology.org/software/macrel
Other
67 stars 9 forks source link

tmp file does not exist error #68

Closed chuanfaliu closed 6 months ago

chuanfaliu commented 6 months ago

I encountered an error while running the abundance subcommand: macrel abundance -1 SRR16178793_1.fastq.gz -2 SRR16178793_2.fastq.gz --fasta peptide.fasta --output ./output/SRR16178793_abun --tag ./outtag/SRR16178793_outtag -t 16

the error is that: ...... [M::mem_process_seqs] Processed 3299460 protein sequences in 743.597 CPU sec, 47.076 real sec [M::process] Read 714582 protein sequences (34476126 AA)... [M::mem_process_seqs] Processed 3298914 protein sequences in 688.743 CPU sec, 43.647 real sec [M::mem_process_seqs] Processed 714582 protein sequences in 146.474 CPU sec, 9.278 real sec [M::renderNumberAligned] Aligned 34710242 out of 51321549 total detected ORF sequences (67.63%) [main] Version: 1.4.6 [main] CMD: paladin align -t 16 -T 20 -f 10 -z 11 -a -V -M /tmp/tmpdcdn385h/paladin.faa /tmp/tmpdcdn385h/preproc.pair.1.fq.gz [main] Real time: 8120.267 sec; CPU: 84019.436 sec NGLess v1.5.0 (C) NGLess authors https://ngless.embl.de/

When publishing results from this script, please cite the following references:

     - Coelho, L.P., Alves, R., Monteiro, P., Huerta-Cepas, J., Freitas, A.T., and Bork, P.,
     NG-meta-profiler: fast processing of metagenomes using NGLess, a domain-specific language. in
     Microbiome 7:84 (2019). DOI: https://doi.org/10.1186/s40168-019-0684-8

[Mon 29-04-2024 21:57] Line 9: /tmp/counts.paladin154619-0.txt: renameFile:renamePath:rename: does not exist (No such file or directory) Exiting after fatal error: /tmp/counts.paladin154619-0.txt: renameFile:renamePath:rename: does not exist (No such file or directory)

Traceback (most recent call last): File "/path/to/bin/macrel", line 10, in sys.exit(main()) File "/path/to/macrel/main.py", line 371, in main do_abundance(args, tdir,logfile) File "/path/to/macrel/main.py", line 222, in do_abundance subprocess.check_call([ File "/path/to/subprocess.py", line 364, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['ngless', '--no-create-report', '--quiet', '-j', '16', '/path/to/scripts/count.ngl', '/tmp/tmpdcdn385h/paladin.out.sam', './output/SRR16178793/ ./outtag/.abundance.txt']' returned non-zero exit status 1.

Could you please help to see what caused the error? thanks a lot.

celiosantosjr commented 6 months ago

Dear chuanfaliu,

I have tested the same sample you have used with a set of peptides from here. I used only the first million reads, which should break the system. However, if you note, I am using a different set of parameters for --tag and --output:

macrel abundance -1 SRR16178793_pass_1.fastq.gz -2 SRR16178793_pass_2.fastq.gz --fasta abundances/ref.faa.gz --output output/ --tag SRR16178793_outtag -t 16

The outputs show Macrel running to its completion:

NGLess v1.4.2 (C) NGLess authors
https://ngless.embl.de/

When publishing results from this script, please cite the following references:

     - Coelho, L.P., Alves, R., Monteiro, P., Huerta-Cepas, J., Freitas, A.T., and Bork, P.,
     NG-meta-profiler: fast processing of metagenomes using NGLess, a domain-specific language. in
     Microbiome 7:84 (2019). DOI: http://doi.org/10.1186/s40168-019-0684-8

[Mon 29-04-2024 22:12] Line 3: Heuristic for FastQ encoding determination for file "SRR16178793_pass_1.fastq.gz" cannot be 100% confident. Guessing 33 offset (Sanger encoding, used by newer Illumina machines).
[Mon 29-04-2024 22:12] Line 3: Heuristic for FastQ encoding determination for file "SRR16178793_pass_2.fastq.gz" cannot be 100% confident. Guessing 33 offset (Sanger encoding, used by newer Illumina machines).
[M::command_index] Translating protein sequence...0.00 sec
[M::command_index] Packing protein sequence... 0.00 sec
[M::command_index] Constructing BWT for the packed sequence... 0.00 sec
[M::command_index] Updating BWT... 0.00 sec
[M::command_index] Packing forward-only protein squence... 0.00 sec
[M::command_index] Constructing suffix array... 0.00 sec
[main] Version: 1.4.6
[main] CMD: paladin index -r3 /tmp/tmptpzh19lg/paladin.faa
[main] Real time: 0.048 sec; CPU: 0.006 sec
[M::command_align] Loading the index for reference '/tmp/tmptpzh19lg/paladin.faa'...
[M::index_load_from_disk] Read 0 ALT contigs
[M::writeReadsProtein] Detecting open reading frames...
[M::writeReadsProtein] Detected and translated 999980 open reading frames in 999980 sequences
[M::process] Read 3243246 protein sequences (160000072 AA)...
[M::process] Read 2756634 protein sequences (135993944 AA)...
[M::mem_process_seqs] Processed 3243246 protein sequences in 493.461 CPU sec, 128.511 real sec
[M::mem_process_seqs] Processed 2756634 protein sequences in 422.565 CPU sec, 113.586 real sec
[M::renderNumberAligned] Aligned 0 out of 999980 total detected ORF sequences (0.00%)
[main] Version: 1.4.6
[main] CMD: paladin align -t 16 -T 20 -f 10 -z 11 -a -V -M /tmp/tmptpzh19lg/paladin.faa /tmp/tmptpzh19lg/preproc.pair.1.fq.gz
[main] Real time: 252.614 sec; CPU: 926.312 sec
NGLess v1.4.2 (C) NGLess authors
https://ngless.embl.de/

When publishing results from this script, please cite the following references:

     - Coelho, L.P., Alves, R., Monteiro, P., Huerta-Cepas, J., Freitas, A.T., and Bork, P.,
     NG-meta-profiler: fast processing of metagenomes using NGLess, a domain-specific language. in
     Microbiome 7:84 (2019). DOI: http://doi.org/10.1186/s40168-019-0684-8

Also outputting the correct table:

MACREL
smORF_1 0
smORF_10 0
smORF_9 0

I detected as a problem breaking your Macrel run the entered parameters. Specifically, the step using path.join() as input for the NGLess script. It happens because the -tag parameter is not an address, but a string that identifies the run inside the output folder. I hope this helps.

Best regards, Celio

luispedro commented 6 months ago

Unfortunately, that is a pretty bad error message, but we've seen if pop up if you run out of memory (RAM)