czbiohub-sf / orpheum

Orpheum (Previously called and published under sencha) is a Python package for directly translating RNA-seq reads into coding protein sequence.
MIT License
18 stars 4 forks source link

sencha translate memory allocation error #85

Closed phoenixAja closed 4 years ago

phoenixAja commented 4 years ago

When running nf-predictorthologs with this Makefile:

REFSEQ_FASTA=/home/olga/data_lg/czbiohub-reference/ncbi/refseq/releases/refseq-release201--2020-07-21/nonredundant-protein/complete__nonredundant_protein.faa.gz
BUSCO_METAZOA=/home/olga/data_sm/immune-evolution/databases/busco/orthodb-v10/metazoa_odb10/metazoa_odb10.fasta
OUTDIR_BASE=/mnt/data_sm/home/olga/pipeline-results/bat/nf-predictorthologs
WORK_DIR=/mnt/data_sm/home/phoenix/pipeline-work/bat/nf-predictorthologs
BAM_BASE=/mnt/data_sm/home/phoenix/batlas/BatBams/immune-tissue-softlinks

busco_metazoa:
    nextflow run czbiohub/nf-predictorthologs \
        -profile docker \
        --bam ${BAM_BASE}/*.bam \
        --proteome_search_fasta ${REFSEQ_FASTA} \
        --proteome_translate_fasta ${BUSCO_METAZOA} \
        --translate_jaccard_threshold 0.95 \
        --translate_peptide_ksize 9,10,11,12,13,14,15,16,17,18,19,20,21 \
        --translate_molecule protein,dayhoff \
        -with-tower \
        --outdir ${OUTDIR_BASE}--$@/ \
        -w ${WORK_DIR} \
        --single_end \
        -resume \
        --search_noncoding \
        --max_cpus 90 \
        --max-memory 500.GB \
        --max_time 200.h \
        -r peptide-ksize-tokenize

I ran into this error with sencha translate:

full nextflow output:

executor >  local (19)
[c1/6b89f6] process > get_software_versions                                               [100%] 1 of 1 ✔
[07/907966] process > sambamba_dedup (1)                                                  [100%] 1 of 1, cached: 1 ✔
[83/cd2196] process > sambamba_index (1)                                                  [100%] 1 of 1, cached: 1 ✔
[2a/1520f5] process > samtools_fastq_no_intersect (null)                                  [100%] 1 of 1, cached: 1 ✔
[48/0ed104] process > fastqc (bat2-BM_possorted_genome_bam_dedup)                         [100%] 1 of 1, cached: 1 ✔
[5d/f8abfe] process > fastp (bat2-BM_possorted_genome_bam_dedup)                          [100%] 1 of 1, cached: 1 ✔
[c6/a5c75d] process > make_protein_index (metazoa_odb10.fasta__molecule-protein_ksize-20) [100%] 13 of 13, cached: 13 ✔
[c0/39d6ad] process > translate (bat2-BM_possorted_genome_bam_dedup)                      [ 68%] 17 of 25, failed: 17, retries: 12
[8b/d9e069] process > diamond_prepare_taxa (taxdmp)                                       [100%] 1 of 1, cached: 1 ✔
[6c/037c1e] process > diamond_makedb (complete__nonredundant_protein.faa)                 [100%] 1 of 1, cached: 1 ✔
[-        ] process > diamond_blastp                                                      -
[fa/792356] process > gunzip_infernal_db (Rfam.cm.gz)                                     [100%] 1 of 1, cached: 1 ✔
[-        ] process > infernal_cmsearch                                                   -
[58/189bc4] process > multiqc                                                             [100%] 1 of 1 ✔
[bd/f711ec] process > output_documentation                                                [100%] 1 of 1, cached: 1 ✔
-[nf-core/predictorthologs] Pipeline completed with errors-
WARN: Tower request field `workflow.errorMessage` exceeds expected size | offending value: `18285345it [1:50:35, 2731.00it/s]
18285640it [1:50:35, 2790.98it/s]
18285920it [1:50:35, 2775.29it/s]
18286199it [1:50:35, 2772.55it/s]
18286488it [1:50:35, 2804.68it/s]
18286769it [1:50:35, 2804.12it/s]
18287050it [1:50:35, 2800.52it/s]
18287331it [1:50:36, 2715.45it/s]
18287604it [1:50:36, 2705.99it/s]
18287876it [1:50:36, 2707.45it/s]
18288155it [1:50:36, 2729.29it/s]
18288431it [1:50:36, 2735.95it/s]
18288705it [1:50:36, 2697.60it/s]
18289014it [1:50:36, 2804.42it/s]
18289319it [1:50:36, 2873.36it/s]
18289608it [1:50:36, 2830.20it/s]
18289893it [1:50:36, 2752.49it/s]
18290170it [1:50:37, 2685.12it/s]
18290440it [1:50:37, 2648.12it/s]
18290733it [1:50:37, 2726.00it/s]
18291014it [1:50:37, 2748.38it/s]
18291198it [1:50:37, 2755.74it/s]
Traceback (most recent call last):
  File "/opt/conda/envs/nf-core-predictorthologs-1.0dev/bin/sencha", line 8, in <module>
    sys.exit(cli())
  File "/opt/conda/envs/nf-core-predictorthologs-1.0dev/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/envs/nf-core-predictorthologs-1.0dev/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/opt/conda/envs/nf-core-predictorthologs-1.0dev/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/envs/nf-core-predictorthologs-1.0dev/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/envs/nf-core-predictorthologs-1.0dev/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/opt/conda/envs/nf-core-predictorthologs-1.0dev/lib/python3.7/site-packages/sencha/translate.py", line 597, in cli
    translate_obj.set_coding_scores_all_files()
  File "/opt/conda/envs/nf-core-predictorthologs-1.0dev/lib/python3.7/site-packages/sencha/translate.py", line 386, in set_coding_scores_all_files
    df = self.score_reads_per_file(reads_file)
  File "/opt/conda/envs/nf-core-predictorthologs-1.0dev/lib/python3.7/site-packages/sencha/translate.py", line 361, in score_reads_per_file
    ) in self.maybe_score_single_read(description, sequence):
  File "/opt/conda/envs/nf-core-predictorthologs-1.0dev/lib/python3.7/site-packages/sencha/translate.py", line 332, in maybe_score_single_read
    scores = self.check_peptide_content(description, sequence)
  File "/opt/conda/envs/nf-core-predictorthologs-1.0dev/lib/python3.7/site-packages/sencha/translate.py", line 289, in check_peptide_content
    self.file_handles["noncoding_nucleotide"], seqname, sequence
  File "/opt/conda/envs/nf-core-predictorthologs-1.0dev/lib/python3.7/site-packages/sencha/translate.py", line 155, in maybe_write_fasta
    write_fasta(file_handle, description, sequence)
  File "/opt/conda/envs/nf-core-predictorthologs-1.0dev/lib/python3.7/site-packages/sencha/translate.py", line 115, in write_fasta
    file_handle.write(">{}\n{}\n".format(description, sequence))
OSError: [Errno 12] Cannot allocate memory`, size: 3112 (max: 255)
Error executing process > 'translate (bat2-BM_possorted_genome_bam_dedup)'

Caused by:
  Process `translate (bat2-BM_possorted_genome_bam_dedup)` terminated with an error exit status (1)

Command executed:

  sencha translate \
    --molecule protein \
    --peptide-ksize 17 \
    --jaccard-threshold 0.95 \
    --noncoding-nucleotide-fasta bat2-BM_possorted_genome_bam_dedup__noncoding_reads_nucleotides.fasta \
    --coding-nucleotide-fasta bat2-BM_possorted_genome_bam_dedup__coding_reads_nucleotides.fasta \
    --csv bat2-BM_possorted_genome_bam_dedup__coding_scores.csv \
    --json-summary bat2-BM_possorted_genome_bam_dedup__coding_summary.json \
    --peptides-are-bloom-filter \
    metazoa_odb10__molecule-protein_ksize-17.bloomfilter \
    bat2-BM_possorted_genome_bam_dedup_R1_trimmed.fastq.gz > bat2-BM_possorted_genome_bam_dedup__coding_reads_peptides.fasta

Command exit status:
  1

Command output:
  (empty)

Command error:
  18285345it [1:50:35, 2731.00it/s]
  18285640it [1:50:35, 2790.98it/s]
  18285920it [1:50:35, 2775.29it/s]
  18286199it [1:50:35, 2772.55it/s]
  18286488it [1:50:35, 2804.68it/s]
  18286769it [1:50:35, 2804.12it/s]
  18287050it [1:50:35, 2800.52it/s]
  18287331it [1:50:36, 2715.45it/s]
  18287604it [1:50:36, 2705.99it/s]
  18287876it [1:50:36, 2707.45it/s]
  18288155it [1:50:36, 2729.29it/s]
  18288431it [1:50:36, 2735.95it/s]
  18288705it [1:50:36, 2697.60it/s]
  18289014it [1:50:36, 2804.42it/s]
  18289319it [1:50:36, 2873.36it/s]
  18289608it [1:50:36, 2830.20it/s]
  18289893it [1:50:36, 2752.49it/s]
  18290170it [1:50:37, 2685.12it/s]
  18290440it [1:50:37, 2648.12it/s]
  18290733it [1:50:37, 2726.00it/s]
  18291014it [1:50:37, 2748.38it/s]
  18291198it [1:50:37, 2755.74it/s]
  Traceback (most recent call last):
    File "/opt/conda/envs/nf-core-predictorthologs-1.0dev/bin/sencha", line 8, in <module>
      sys.exit(cli())
    File "/opt/conda/envs/nf-core-predictorthologs-1.0dev/lib/python3.7/site-packages/click/core.py", line 829, in __call__
      return self.main(*args, **kwargs)
    File "/opt/conda/envs/nf-core-predictorthologs-1.0dev/lib/python3.7/site-packages/click/core.py", line 782, in main
      rv = self.invoke(ctx)
    File "/opt/conda/envs/nf-core-predictorthologs-1.0dev/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
    File "/opt/conda/envs/nf-core-predictorthologs-1.0dev/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
      return ctx.invoke(self.callback, **ctx.params)
    File "/opt/conda/envs/nf-core-predictorthologs-1.0dev/lib/python3.7/site-packages/click/core.py", line 610, in invoke
      return callback(*args, **kwargs)
    File "/opt/conda/envs/nf-core-predictorthologs-1.0dev/lib/python3.7/site-packages/sencha/translate.py", line 597, in cli
      translate_obj.set_coding_scores_all_files()
    File "/opt/conda/envs/nf-core-predictorthologs-1.0dev/lib/python3.7/site-packages/sencha/translate.py", line 386, in set_coding_scores_all_files
      df = self.score_reads_per_file(reads_file)
    File "/opt/conda/envs/nf-core-predictorthologs-1.0dev/lib/python3.7/site-packages/sencha/translate.py", line 361, in score_reads_per_file
      ) in self.maybe_score_single_read(description, sequence):
    File "/opt/conda/envs/nf-core-predictorthologs-1.0dev/lib/python3.7/site-packages/sencha/translate.py", line 332, in maybe_score_single_read
      scores = self.check_peptide_content(description, sequence)
    File "/opt/conda/envs/nf-core-predictorthologs-1.0dev/lib/python3.7/site-packages/sencha/translate.py", line 289, in check_peptide_content
      self.file_handles["noncoding_nucleotide"], seqname, sequence
    File "/opt/conda/envs/nf-core-predictorthologs-1.0dev/lib/python3.7/site-packages/sencha/translate.py", line 155, in maybe_write_fasta
      write_fasta(file_handle, description, sequence)
    File "/opt/conda/envs/nf-core-predictorthologs-1.0dev/lib/python3.7/site-packages/sencha/translate.py", line 115, in write_fasta
      file_handle.write(">{}\n{}\n".format(description, sequence))
  OSError: [Errno 12] Cannot allocate memory

Work dir:
  /mnt/data_sm/home/phoenix/pipeline-work/bat/nf-predictorthologs/79/577bfef928dfa73852045aefd36cc4

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
One more CTRL+C to force exit

Makefile:9: recipe for target 'busco_metazoa' failed
make: *** [busco_metazoa] Interrupt
olgabot commented 4 years ago

Thanks! That Makefile is in a private repo -- can you post the full error and sencha translate command?

phoenixAja commented 4 years ago

this is the full error isn't it?

olgabot commented 4 years ago

seeing the full sencha translate command and nextflow output would be helpful, too 👍

phoenixAja commented 4 years ago

@olgabot just updated the issue, also @lekhakaranam mentioned that she was running into the same issue running on the botryllis data.

phoenixAja commented 4 years ago

since this is currently blocking both @lekhakaranam and I on making progress should we flag this issue as high priority?

pranathivemuri commented 4 years ago

this should be resolved with PR #93