GaetanBenoitDev / metaMDBG

MetaMDBG: a lightweight assembler for long and accurate metagenomics reads.
MIT License
111 stars 4 forks source link

Segmentation fault with gfa function #22

Open CJREID opened 1 month ago

CJREID commented 1 month ago

Hi there,

I'm running metaMDBG gfa version 1.0 installed via bioconda as follows metaMDBG gfa --assembly-dir /scratch3/rei219/projects/SBM/outputs/metamdbg/plables_test/B --k 21 --contigpath --threads 64

The job is submitted to an HPC system via slurm and the following resource request:

#!/bin/bash -l
#SBATCH --job-name=metamdbg_B_gfa_contig
#SBATCH --ntasks-per-node 1
#SBATCH --cpus-per-task=64
#SBATCH --mem=200GB
#SBATCH --time=8:00:00
#SBATCH --output=logs/%x.%j.out
#SBATCH --error=logs/%x.%j.err

The assembly folder has the following structure:

Directory structure ``` . ├── contigs.fasta.gz ├── metaMDBG.log └── tmp ├── contig_data.txt ├── contigs_polished.fasta.gz ├── contigs_uncorrected.fasta.gz ├── data.txt ├── input.txt ├── memoryTrack.txt ├── parameters.gz ├── pass_k10 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k11 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k12 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k13 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k14 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k15 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k16 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k17 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k18 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k19 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k20 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k21 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.tmp │ ├── assembly_graph.gfa.unitigs │ ├── assembly_graph.noseq.gfa.tmp │ ├── parameters.gz │ └── unitigs.fasta.gz ├── pass_k22 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k23 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k24 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k25 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k26 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k27 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k28 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k29 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k30 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k31 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k32 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k33 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k34 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k35 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k36 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k37 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k38 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k39 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k40 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k41 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k42 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k43 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k44 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k45 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k46 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k47 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k48 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k49 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k5 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k50 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k51 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k52 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k53 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k54 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k55 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k56 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k57 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k58 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k59 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k6 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k60 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k61 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k62 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k63 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k64 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k65 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k66 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k67 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k68 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k69 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k7 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k70 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k71 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k72 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k73 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k74 │ ├── assembly_graph.gfa │ └── parameters.gz ├── pass_k8 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── pass_k9 │ ├── assembly_graph.gfa │ ├── assembly_graph.gfa.unitigs │ └── parameters.gz ├── perf.txt ├── read_data_init.txt ├── read_stats.txt └── time.txt 71 directories, 225 files ```

When I run gfa, I get the following printed to stderr:

    Assembly dir: /scratch3/rei219/projects/SBM/outputs/metamdbg/plables_test/B
    Used k: 21
    Homopolymer compression: 1
    Data type: 0

Generating unitig sequences
Loading unitig sequences
Creating assembly graph file
run_B_gfa_contig_v1.sh: line 6: 62151 Segmentation fault      (core dumped) metaMDBG gfa --assembly-dir /scratch3/rei219/projects/SBM/outputs/metamdbg/plables_test/B --k 21 --contigpath --threads 64

The end of the metaMDBG.log is as follows:

Creating basespace contigs: /scratch3/rei219/projects/SBM/outputs/metamdbg/plables_test/B/tmp//pass_k21//assembly_graph.gfa.unitigs
Nb contigs: 414446
Nb bps: 54358
Checksum: 0
Checksum (best support): 0
Loading unitig sequences
Parsing file: /scratch3/rei219/projects/SBM/outputs/metamdbg/plables_test/B/tmp//pass_k21//unitigs.fasta.gz
Parsing file done (nb reads: 28)
Creating assembly graph file

I have also tried running with --thread 1, without the --contigpath flag and with 512G memory but get the same error. Unfortunately, I can't find any information that is more informative about where the error is occurring.

Any help appreciated!

Thanks,

Cam

GaetanBenoitDev commented 1 month ago

Hi, thanks for you detailed report, there is indeed a bug in the gfa command, I should be able to fix it next week.

CJREID commented 1 month ago

Hi @GaetanBenoitDev,

No worries, thanks for the quick reply. Good to know it was a bug and not operator error!

Cam

GaetanBenoitDev commented 1 month ago

It should be fixed if you compile from source. I will update the bioconda version in a couple weeks. Thanks

christinehe commented 4 weeks ago

Hi, I'm running into the same issue after compiling metaMDBG from source. My command:

metaMDBG gfa --assembly-dir /data/che --k 21 --threads 24

Stderr:

Assembly dir: /data/che
Used k: 21 
Homopolymer compression: 0 
Data type: 1 

Generating unitig sequences
Loading unitig sequences
Creating assembly graph file
generate_gfa.sh: line 8: 1014218 Segmentation fault (core dumped) metaMDBG gfa --assembly-dir /data/che --k 21 --threads 24

Would appreciate any insight!

GaetanBenoitDev commented 3 weeks ago

Hi,

If you compiled from source, do you use the compiled software in ./bin/metaMDBG ?

christinehe commented 1 week ago

Hi, yes, I'm using the compiled software.