COMBINE-lab / simpleaf

A rust framework to make using alevin-fry even simpler
BSD 3-Clause "New" or "Revised" License
46 stars 5 forks source link

simpleaf index error piscem index failed to build succesfully ExitStatus(unix_wait_status(6)) #163

Open lichtobergo opened 3 weeks ago

lichtobergo commented 3 weeks ago

Hi @rob-p, I was trying to build the splici index for the GENCODE GRCm39 mouse genome. I stuck to the described procedure from here I used conda to install simpleaf mamba create -n af -y -c bioconda -c conda-forge simpleaf

To build the index I ran simpleaf index with these commands

export ALEVIN_FRY_HOME='.'
simpleaf set-paths
ulimit -n 2048

simpleaf index \
--output ref/GRCm39_vM36 \
--fasta ref/GRCm39.primary_assembly.genome.fa \
--gtf ref/gencode.vM36.primary_assembly.annotation.gtf \
--rlen 91 \
--threads 16

Unfortunately, I get this error:

2024-11-04T16:47:00.791536Z  INFO simpleaf::simpleaf_commands::indexing: preparing to make reference with roers
2024-11-04T16:47:07.368482Z  INFO grangers::reader::gtf: Finished parsing the input file. Found 5 comments and 2478715 records.
2024-11-04T16:47:10.528725Z  INFO roers: Built the Grangers object for 2478715 records
2024-11-04T16:47:11.330363Z  INFO roers: Proceed 1298756 exon records from 278375 transcripts
2024-11-04T16:47:33.121904Z  INFO roers: Processing 1020381 intronic records
2024-11-04T16:48:29.884495Z  INFO roers: Done!
2024-11-04T16:48:29.983228Z  INFO simpleaf::simpleaf_commands::indexing: piscem build cmd : /home/light/miniforge3/envs/af/bin/piscem build -k 31 -m 19 -o ref/GRCm39_vM36/index/piscem_idx -s ref/GRCm39_vM36/ref/roers_ref.fa --seed 1 --threads 16
2024-11-04T16:48:36.318253Z ERROR simpleaf::utils::prog_utils: command unsuccessful (signal: 6 (SIGABRT)): "/home/light/miniforge3/envs/af/bin/piscem" "build" "-k" "31" "-m" "19" "-o" "ref/GRCm39_vM36/index/piscem_idx" "-s" "ref/GRCm39_vM36/ref/roers_ref.fa" "--seed" "1" "--threads" "16"
2024-11-04T16:48:36.318277Z ERROR simpleaf::utils::prog_utils: stdout :
====

Constructing the compacted reference de Bruijn graph for k = 31.

Enumerating the vertices of the de Bruijn graph.

Structural information for the de Bruijn graph is written to ref/GRCm39_vM36/index/piscem_idx_cfish.json.
["ref_index_builder", "-i", "ref/GRCm39_vM36/index/piscem_idx_cfish", "-k", "31", "-m", "19", "--canonical-parsing", "--build-ec-table", "-o", "ref/GRCm39_vM36/index/piscem_idx", "-d", "./workdir.noindex", "-t", "16", "--seed", "1"]
====
2024-11-04T16:48:36.318287Z ERROR simpleaf::utils::prog_utils: stderr :
====
2024-11-04T16:48:30.021281Z  INFO piscem: starting piscem build
2024-11-04T16:48:30.021334Z  INFO piscem: Computing and recording reference signatures...
trimmed polyA tails from 0 records
2024-11-04T16:48:34.817004Z  INFO piscem: done.
2024-11-04T16:48:34.817301Z  INFO piscem: args = ["cdbg_builder", "--seq", "ref/GRCm39_vM36/ref/roers_ref.fa", "-k", "31", "--track-short-seqs", "--poly-N-stretch", "-o", "ref/GRCm39_vM36/index/piscem_idx_cfish", "-t", "16", "-f", "3", "-w", "./workdir.noindex"]
**********************************************************************************************************************************
Error: Cannot open temporary file ref/GRCm39_vM36/index/kmc_00818.bin

Usage :
Efficiently construct the compacted de Bruijn graph from sequencing reads or reference sequences
Usage:
  cuttlefish build [OPTION...]

 common options:
  -s, --seq arg            input files
  -l, --list arg           input file lists
  -d, --dir arg            input file directories
  -k, --kmer-len arg       k-mer length (default: 27)
  -t, --threads arg        number of threads to use (default: 4)
  -o, --output arg         output file
  -w, --work-dir arg       working directory (default: .)
  -m, --max-memory arg     soft maximum memory limit in GB (default: 3)
      --unrestrict-memory  do not impose memory usage restriction
  -h, --help               print usage

 cuttlefish_1 options:
  -f, --format arg        output format (0: FASTA, 1: GFA 1.0, 2: GFA 2.0, 3:
                          GFA-reduced)
      --track-short-seqs  track existence of sequences shorter than k bases
      --poly-N-stretch    includes information of polyN stretches in the
                          tiling output

 cuttlefish_2 options:
      --read        construct a compacted read de Bruijn graph (for FASTQ
                    input)
      --ref         construct a compacted reference de Bruijn graph (for
                    FASTA input)
  -c, --cutoff arg  frequency cutoff for (k + 1)-mers (default: refs: 1,
                    reads: 2)
      --path-cover  extract a maximal path cover of the de Bruijn graph

 debug options:
      --vertex-set arg  set of vertices, i.e. k-mers (KMC database) prefix
                        (default: "")
      --edge-set arg    set of edges, i.e. (k + 1)-mers (KMC database) prefix
                        (default: "")

 specialized options:
      --save-mph       save the minimal perfect hash (BBHash) over the vertex
                       set
      --save-buckets   save the DFA-states collection of the vertices
      --save-vertices  save the vertex set of the graph

directory already exists
fatal runtime error: Rust cannot catch foreign exceptions
====
Error: piscem index failed to build succesfully ExitStatus(unix_wait_status(6))

As I had previously a different error which was hardware related, I post my current specifications:

Operating System: TUXEDO OS 3 KDE Plasma Version: 6.1.5 KDE Frameworks Version: 6.6.0 Qt Version: 6.7.2 Kernel Version: 6.11.0-103009-tuxedo (64-bit) Graphics Platform: Wayland Processors: 16 × AMD Ryzen 7 8845HS w/ Radeon 780M Graphics Memory: 92,1 GiB of RAM Graphics Processor: AMD Radeon Graphics Manufacturer: TUXEDO Product Name: TUXEDO InfinityBook Pro AMD Gen9

I appreciate any help on this issue. Best, Michael