bioinfologics / satsuma2

FFT cross-correlation based synteny aligner, (re)designed to make full use of parallel computing
41 stars 13 forks source link

satsuma dies with memory error #12

Closed dcopetti closed 5 years ago

dcopetti commented 6 years ago

Hello, I was able to run SatsumaSynteny2 with two 3-Mb scaffolds (with -slaves 2 -threads 5 -sl_mem 50), now that I try to align 640 Mb of scaffolds vs a reference (single fasta) of 620, I get always errors like this:

TIME_LOG: 1533825967118 -  Loading array
Loading fasta Hvul_chr4.fa into kmer array
TIME_LOG: 1533825967118 -  Loading array
Loading fasta LG4_scf.fa into kmer array
Kmer array with 602141644 elements created
TIME_LOG: 1533825976910 -  Sorting array
Kmer array with 666622256 elements created
TIME_LOG: 1533825982241 -  Sorting array
Kmer array sorted
Kmer array filtered to 272368385 elements
Kmer array sorted
Kmer array filtered to 179908979 elements
Starting to create matching positions
141286 matching positions
Sorting the matches
Dumping matches of 1 kmers with jumps of up to 30kmers to chr4/kmatch_results.k31
23667 matches dumped
Waiting for seed pre-filters...
loading results for k=11
Segmentation fault (core dumped)

even when the resources required are the lowest: SatsumaSynteny2 -q LG4_scf.fa -t Hvul_chr4.fa -o chr4 -slaves 1 -threads 1 -sl_mem 20 What is the right setting to align my type of sequences?

My next step would be to visualize such alignments: I see that the tool MizBee could be useful, but I don't find how to prepare the input files from Satsuma's output: is there a script or something? Also for the annotation would be great. thanks, Dario

mictadlo commented 5 years ago

Hi, To get the input file for MizBee you have to run BlockDisplaySatsuma -i results-19vs19/satsuma_summary.chained.out -t asm-19g.chr.fasta -q asm-19g.chr.fasta > results-19vs19/mizbee.txt. I also would like to use MizBee but unfortunately, I get the same problem because of:

Running seed pre-filter: 
  /satsuma2/bin/KMatch asm-19g.chr.fasta asm-19g.chr.fasta 11 results-19vs19/kmatch_results.k11 11 10 1; touch results-19vs19/kmatch_results.k11.finished
TIME_LOG: TIME_LOG: 1548288220566 -  Loading array 1548288220566 -  Loading array 

Loading fasta asm-19g.chr.fasta into kmer array
Loading fasta asm-19g.chr.fasta into kmer array
Kmer array with 2773860504 elements created
TIME_LOG: 1548288288569 -  Sorting array 
Kmer array with 2773860504 elements created
TIME_LOG: 1548288289717 -  Sorting array 
Kmer array sorted
Kmer array sorted
terminate called after throwing an instance of 'std::length_error'
  what():  vector::_M_default_append
Aborted

I created the below Singularity container in order to be sure that the error is not caused by our HPC environment. Running satsuma inside the container could confirm the same error message.

BootStrap: docker
From: ubuntu:16.04

%help
  A container with satsuma

%post
  apt-get update && apt-get -y upgrade
  apt-get -y install \
    build-essential \
    cmake \
    git    

  rm -rf /var/lib/apt/lists/*
  apt-get clean

  sed -i -e 's/# en_US.UTF-8 UTF-8/en_US.UTF-8 UTF-8/' /etc/locale.gen && \
    dpkg-reconfigure --frontend=noninteractive locales && \
    update-locale LANG=en_US.UTF-8

  # install GSL
  git clone https://github.com/bioinfologics/satsuma2.git
  cd satsuma2
  #sed -i.bak 's|set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -lpthread -std=c++14 -O3 -w")|set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -lpthread -std=c++14 -O0 -w")|' CMakeLists.txt
  sed -i.bak 's|set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -lpthread -std=c++14 -O3 -w")|set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -lpthread -pthread -std=c++14 -O3 -w")|' CMakeLists.txt

  mkdir build && cd build
  cmake .. 
  make
  mv /satsuma2/build/bin/* /satsuma2/bin
  #echo -e '#!/bin/sh \necho "cd $1; $2" | qsub -V  -l select=1:ncpus=${3}:mem=300G -N $5 \n' > /satsuma2/bin/satsuma_run.sh
  chmod +x /satsuma2/bin/satsuma_run.sh

%environment
  export SATSUMA2_PATH=/satsuma2/bin
  export LANG=en_US.UTF-8

By any chance, did you find another way how to load any alignment into MizBee?

Thank you in advance,

Michal

jonwright99 commented 5 years ago

Hi Michal, If you can send me the file 'asm-19g.chr.fasta' I'll try to replicate the error. Jon

mictadlo commented 5 years ago

Hi Jon, I have just sent you an email with the assembly.

Thank you for your help.

Michal

jonwright99 commented 5 years ago

To summarise an offline conversation, this problem seems to be caused by the first KMatch step failing which means SatsumaSynteny doesn't generate the satsuma_summary.chained.out output file which in turn causes BlockDisplaySynteny to fail. We're looking into it further.

mictadlo commented 5 years ago

Hi @jonwright99, Were you able to get it running?

Michal

jonwright99 commented 5 years ago

Hi @mictadlo, I've just pushed a fix which should solve your problem. The issue was caused by no unique 11-mers being found in your genome which caused KMatch to fail so the k11 output file was not generated. Best, Jon

mictadlo commented 5 years ago

Hi @jonwright99, Thank you for doing it. Everything works. Please find below the updated Singularity recipe:

BootStrap: docker
From: ubuntu:16.04

%help
  A container with satsuma

%post
  apt-get update && apt-get -y upgrade
  apt-get -y install \
    build-essential \
    cmake \
    git    

  rm -rf /var/lib/apt/lists/*
  apt-get clean

  sed -i -e 's/# en_US.UTF-8 UTF-8/en_US.UTF-8 UTF-8/' /etc/locale.gen && \
    dpkg-reconfigure --frontend=noninteractive locales && \
    update-locale LANG=en_US.UTF-8

  # install GSL
  git clone https://github.com/bioinfologics/satsuma2.git
  cd satsuma2
  mkdir build && cd build
  cmake .. 
  make
  mv /satsuma2/build/bin/* /satsuma2/bin
  #echo '#!/bin/sh \necho "cd $1; $2" | qsub -V  -l select=1:ncpus=${3}:mem=300G -N $5 \n' > /satsuma2/bin/satsuma_run.sh
  chmod +x /satsuma2/bin/satsuma_run.sh

%environment
  export SATSUMA2_PATH=/satsuma2/bin
  export LANG=en_US.UTF-8

Thank you in advance,

Michal

jonwright99 commented 5 years ago

@dcopetti, does this solve your problem as well or is this no longer an issue for you. Can I close this ticket?

dcopetti commented 5 years ago

@jonwright99 I did not have a chance to try it yet, within a few weeks I will need to re-run the same analysis. Let's close it for now, I will get back if there are issues. Thanks!