PalMuc / TransPi

TransPi – a comprehensive TRanscriptome ANalysiS PIpeline for de novo transcriptome assembly
Other
26 stars 15 forks source link

Error executing process > 'normalize_reads', RAM required 150GB #13

Closed jcoludar closed 3 years ago

jcoludar commented 3 years ago

Greetings! I tried running a docker version of TransPi on my transcriptomic data and after a seemingly successful start it failed during "normalize reads" process. Seemingly it wants to have 150GB worth of RAM (I only have 125, which I always thought would be more than enough). Is there a work-around or am I doing something wrong?

The code I used to run it:

sudo ./nextflow run TransPi.nf --all --maxReadLen 150 --k 25,35,55,75,85 --reads '/media/jcoludar/Daten/Ivan/02_Transcri/Transcriptomes/T5_09_Po_do_VG2/*_R[1,2].fastq.gz' --outdir Results_Polistes -profile docker,TransPiContainer

The Error message

Error executing process > 'normalize_reads (09-Po-du-VG2)'

Caused by: Process requirement exceed available memory -- req: 150 GB; avail: 125.8 GB

Command executed:

echo 09-Po-du-VG2

echo -e "\n-- Starting Normalization --\n"

mem=$( echo 150 GB | cut -f 1 -d " " )

insilico_read_normalization.pl --seqType fq -JM ${mem}G --max_cov 100 --min_cov 1 --left left-09-Po-du-VG2.filter.fq --right right-09-Po-du-VG2.filter.fq --pairs_together --PARALLEL_STATS --CPU 15

echo -e "\n-- DONE with Normalization --\n"

cat .command.out | grep "stats_file" -A 3 | tail -n 3 >09-Po-du-VG2_normStats.txt

cp left.norm.fq left-"09-Po-du-VG2".norm.fq cp right.norm.fq right-"09-Po-du-VG2".norm.fq

mv left.norm.fq 09-Po-du-VG2_norm.R1.fq mv right.norm.fq 09-Po-du-VG2_norm.R2.fq

pigz --best --force -p 15 -r 09-Po-du-VG2_norm.R1.fq pigz --best --force -p 15 -r 09-Po-du-VG2_norm.R2.fq

Command exit status:

Command output: (empty)

Work dir: /media/jcoludar/Daten/Ivan/Software/TransPi/TransPi/work/b3/b323002917226824e319cadb1441f5

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

rivera10 commented 3 years ago

Hello @jcoludar,

In the nextflow.config you need to specify the RAM and CPUs of your system. Currently, that step has a value of 150Gb of RAM. See here for more info. After you change the values to the ones of your system then you can use the -resume option of nextflow.

Anything let me know.

Best, Ramon

jcoludar commented 3 years ago

Hello @rivera10,

Thanks for the reply and sorry, it is my bad for not reading the manual attentively. It worked, but only through normalize read step, sadly. It then got stuck with WARN: Access to undefined parameter avg_ins -- Initialise it to a default value eg. params.avg_ins = some_value I checked and nextflow.config does have it defined to default 200. Am I missing something again? Sorry if that's the case, but I did not find anything about it in the manual.

Cheers, Ivan

rivera10 commented 3 years ago

Hello,

You can omit that warning. In the case you know the insert size of the library you can have it there. Otherwise, it will print that warning to you but the program will continue working.

Best, Ramon

jcoludar commented 3 years ago

Hi Ramon! Thank you for your response! I feel like I am getting close with each step but not quite there yet :) The pipeline went smoothly till Trinotate this time. I really want to make this work, since it seems to be vastly superior to pipelines I use normally, but I really don't understand what went wrong with Trinotate here (which I ran before without issues).

Command error:
  WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
  cp: cannot stat '/media/jcoludar/Daten/Ivan/Software/TransPi/TransPi/DBs/sqlite_db/*.sqlite': No such file or directory

Do you think the warning can be solved by memory allocation via nextflow? As for the sqlite, can it be skipped as a step? On installation I answered "yes" to creating the sqlite DB.

Just in case, here is the full error output

Something went wrong. Check error message below and/or log files.
Error executing process > 'trinotate (09-Po-du-VG2)'

Caused by:
  Process `trinotate (09-Po-du-VG2)` terminated with an error exit status (1)

Command executed:

  for x in `echo 09-Po-du-VG2.combined.okay.fa 09-Po-du-VG2.rnammer.gff 09-Po-du-VG2.combined.okay.fa.transdecoder.pep 09-Po-du-VG2.tmhmm.out 09-Po-du-VG2.signalp.out 09-Po-du-VG2.diamond_blastx.outfmt6 09-Po-du-VG2.diamond_blastp.outfmt6 09-Po-du-VG2.custom.diamond_blastx.outfmt6 09-Po-du-VG2.custom.diamond_blastp.outfmt6 09-Po-du-VG2.TrinotatePFAM.out`;do
      echo ${x} >>.vars.txt
  done

  assembly=$( cat .vars.txt | grep "09-Po-du-VG2.combined.okay.fa" | grep -v "09-Po-du-VG2.combined.okay.fa.transdecoder.pep" )
  transdecoder=$( cat .vars.txt | grep -E "09-Po-du-VG2.*.transdecoder.pep" )
  diamond_blastx=$( cat .vars.txt | grep "09-Po-du-VG2.diamond_blastx.outfmt6" )
  diamond_blastp=$( cat .vars.txt | grep "09-Po-du-VG2.diamond_blastp.outfmt6" )
  custom_blastx=$( cat .vars.txt | grep "09-Po-du-VG2.custom.diamond_blastx.outfmt6" )
  custom_blastp=$( cat .vars.txt | grep "09-Po-du-VG2.custom.diamond_blastp.outfmt6" )
  pfam=$( cat .vars.txt | grep "09-Po-du-VG2.TrinotatePFAM.out" )
  signalp=$( cat .vars.txt | grep "09-Po-du-VG2.signalp.out" )
  tmhmm=$( cat .vars.txt | grep "09-Po-du-VG2.tmhmm.out" )
  rnammer=$( cat .vars.txt | grep "09-Po-du-VG2.rnammer.gff" )

  #Generate gene_trans_map
  #Not using get_Trinity_gene_to_trans_map.pl since all the names are uniq
  cat ${assembly} | awk '{print $1}' | grep ">" | cut -c 2- >a.txt

  paste a.txt a.txt >${assembly}.gene_trans_map

  #Get Trinotate.sqlite from folder (original)
  cp /media/jcoludar/Daten/Ivan/Software/TransPi/TransPi/DBs/sqlite_db/*.sqlite .
  sqlname=`echo /media/jcoludar/Daten/Ivan/Software/TransPi/TransPi/DBs/sqlite_db/*.sqlite | tr "\/" "\n" | grep "\.sqlite"`

  echo -e "\n-- Running Trinotate --\n"

  Trinotate $sqlname init --gene_trans_map ${assembly}.gene_trans_map --transcript_fasta ${assembly} --transdecoder_pep ${transdecoder}

  echo -e "\n-- Ending run of Trinotate --\n"

  echo -e "\n-- Loading hits and predictions to sqlite database... --\n"

  #Load protein hits
  Trinotate $sqlname LOAD_swissprot_blastp ${diamond_blastp}

  #Load transcript hits
  Trinotate $sqlname LOAD_swissprot_blastx ${diamond_blastx}

  #Load custom protein hits
  Trinotate $sqlname LOAD_custom_blast --outfmt6 ${custom_blastp} --prog blastp --dbtype uniprot_metazoa_33208.fasta

  #Load custom transcript hits
  Trinotate $sqlname LOAD_custom_blast --outfmt6 ${custom_blastx} --prog blastx --dbtype uniprot_metazoa_33208.fasta

  #Load Pfam domain entries
  Trinotate $sqlname LOAD_pfam ${pfam}

  #Load transmembrane domains
  if [ -s ${tmhmm} ];then
      Trinotate $sqlname LOAD_tmhmm ${tmhmm}
  else
      echo "No transmembrane domains (tmhmm)"
  fi

  #Load signal peptide predictions
  if [ -s ${signalp} ];then
      Trinotate $sqlname LOAD_signalp ${signalp}
  else
      echo "No Signal-P"
  fi

  #Load rnammer results
  if [ -s ${rnammer} ];then
      Trinotate $sqlname LOAD_rnammer ${rnammer}
  else
      echo "No rnammer results"
  fi

  echo -e "\n-- Loading finished --\n"

  #Report

  echo -e "\n-- Generating report... --\n"

  Trinotate $sqlname report >09-Po-du-VG2.trinotate_annotation_report.xls

  echo -e "\n-- Report generated --\n"

  #Extract info from XLS file

  echo -e "\n-- Creating GO file from XLS... --\n"

  extract_GO_assignments_from_Trinotate_xls.pl --Trinotate_xls 09-Po-du-VG2.trinotate_annotation_report.xls --trans >09-Po-du-VG2.GO.terms.txt

  echo -e "\n-- Done with the GO --\n"

  echo -e "\n-- Creating KEGG file from XLS... --\n"

  cat 09-Po-du-VG2.trinotate_annotation_report.xls | cut -f 1,14 | grep "KEGG" | tr "\`" ";" | grep "KO:K" | sed 's/\tKEGG/\t#KEGG/g' | sed 's/KO:/KO:#/g' | cut -f 1,3 -d "#" | tr -d "#" >09-Po-du-VG2.KEGG.terms.txt

  echo -e "\n-- Done with the KEGG --\n"

  echo -e "\n-- Creating eggNOG file from XLS... --\n"

  cat 09-Po-du-VG2.trinotate_annotation_report.xls | cut -f 1,13 | grep "OG" | tr "\`" ";" | sed 's/^/#/g' | sed 's/;/\n;/g' | cut -f 1 -d "^" | tr -d "\n" | tr "#" "\n" | grep "OG" >09-Po-du-VG2.eggNOG_COG.terms.txt

  echo -e "\n-- Done with the eggNOG --\n"

  echo -e "\n-- Creating PFAM file from XLS... --\n"

  cat 09-Po-du-VG2.trinotate_annotation_report.xls | cut -f 1,10 | grep "PF" | tr "\`" ";" | sed 's/^/#/g' | sed 's/;PF/\n;PF/g' | cut -f 1 -d "^" | tr -d "\n" | tr "#" "\n" | grep "PF" | tr ";" "," >09-Po-du-VG2.PFAM.terms.txt

  echo -e "\n-- Done with the PFAM --\n"

  echo -e "\n-- DONE with Trinotate --\n"

Command exit status:
  1

Command output:
  (empty)

Command error:
  WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
  cp: cannot stat '/media/jcoludar/Daten/Ivan/Software/TransPi/TransPi/DBs/sqlite_db/*.sqlite': No such file or directory

Work dir:
  /media/jcoludar/Daten/Ivan/Software/TransPi/TransPi/work/48/9dfe4fc09ae6fb92d63d7942a07dd9

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
rivera10 commented 3 years ago

Hello. I have not seen that warning before. Are you using conda or containers?

If you are using containers then the creation of the sqlite sometimes can cause problems if you do not have all the perl dependencies needed. One quick solution will be to create a conda env to install the sqlite database using Trinotate.

conda create -n trinotate -c bioconda trinotate=3.2.1=pl526_0
conda activate trinotate
Build_Trinotate_Boilerplate_SQLite_db.pl Trinotate
rivera10 commented 3 years ago

Hello @jcoludar,

Did you manage to solve the problem? I will close this issue for now. If you still have problems feel free to open it again.

I also did several changes to the script to solve other issues. I will suggest to pull again the repository so you have all the updates.

Anything let us know.

Best, Ramon