WarrenLab / hic-scaffolding-nf

Nextflow pipeline for scaffolding genome assemblies with Hi-C reads
MIT License
12 stars 3 forks source link

Error on Jucier Pre, exit status 57 #5

Open carla-hazelf opened 7 months ago

carla-hazelf commented 7 months ago

Hello,

Thank you for developing this pipeline.

I run the pipeline on a linux HPC system with the following input using -profile conda:

nextflow run WarrenLab/hic-scaffolding-nf \
    -profile conda --juicer-tools-jar /path/to/juicer-tools-jar.jar  \
    --extra-yahs-args "-e GATC"    \
    --contigs /path/to/fasta.fasta \
    --r1Reads path/to/hi-c/*_1.fq.gz \
    --r2Reads path/to/hi-c/*_2.fq.gz

And I receive the following error;

executor >  local (7)
[b2/63a1db] process > PRINT_VERSIONS     [100%] 1 of 1 ✔
[97/c491ea] process > SAMTOOLS_FAIDX (1) [100%] 1 of 1 ✔
[27/aa6e6b] process > CHROMAP_INDEX (1)  [100%] 1 of 1 ✔
[ca/a11554] process > CHROMAP_ALIGN (1)  [100%] 1 of 1 ✔
[ef/6e00dd] process > YAHS_SCAFFOLD (1)  [100%] 1 of 1 ✔
[fe/5e01fe] process > JUICER_PRE (1)     [100%] 1 of 1, failed: 1 ✘
[0a/392d9f] process > ASSEMBLY_STATS (1) [100%] 1 of 1 ✔
ERROR ~ Error executing process > 'JUICER_PRE (1)'

Caused by:
  Process `JUICER_PRE (1)` terminated with an error exit status (57)

Command executed:

  juicer pre -a -o out_JBAT         yahs.out.bin         yahs.out_scaffolds_final.agp         contigs.fa.fai

  asm_size=$(awk '{s+=$2} END{print s}' contigs.fa.fai)
  java -Xmx36G -jar /nfs/home/finnca/programmes/Juicebox-2.20.00/out/artifacts/juicer_tools_jar/juicer_tools.jar         pre out_JBAT.txt out_JBAT.hic <(echo "assembly ${asm_size}")

Command exit status:
  57

Command output:
  WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
  WARN [2023-11-28T16:55:07,678]  [Globals.java:138] [main]  Development mode is enabled
  Using 1 CPU thread(s) for primary task
  Using 10 CPU thread(s) for secondary task

Command error:
  [I::main_pre] make juicer pre input from BIN file yahs.out.bin
  [I::make_juicer_pre_file_from_bin] 0 read pairs processed
  [I::main_pre] genome size: 648299410
  [I::main_pre] scale factor: 1
  [I::main_pre] chromosome sizes for juicer_tools pre -
  PRE_C_SIZE: assembly 648299410
  [I::main_pre] JUICER_PRE CMD: java -Xmx36G -jar ${juicer_tools} pre out_JBAT.txt out_JBAT.hic <(echo "assembly 648299410")
  [I::main_pre] Version: 1.1
  [I::main_pre] CMD: juicer pre -a -o out_JBAT yahs.out.bin yahs.out_scaffolds_final.agp contigs.fa.fai
  [I::main_pre] Real time: 0.004 sec; CPU: 0.003 sec; Peak RSS: 0.001 GB
  WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
  WARN [2023-11-28T16:55:07,678]  [Globals.java:138] [main]  Development mode is enabled
  Using 1 CPU thread(s) for primary task
  Using 10 CPU thread(s) for secondary task
  out_JBAT.txt does not exist or does not contain any reads.

EDIT; I am new to Hi-C, and I did not prepare this data myself; am I misunderstanding any preprocessing steps I need to do with the HiC illumina data, or am I not understanding the code? Thank you

esrice commented 7 months ago

First off, this is unrelated to your issue, but if you're running it on a cluster, you'll need to set some additional options to tell nextflow to run the jobs on the cluster nodes instead of the head nodes. See here for more info: https://www.nextflow.io/docs/latest/executor.html

Other than that, you appear to be running the pipeline correctly. The only step that failed is the step to make a heatmap that you can open in juicebox. However, the error message says that there were 0 reads in the input, so my guess is that the scaffolding didn't work either. So first, take a look at the assembly output and the stats related to it. Does it look like the scaffolding actually worked (e.g., is the N50 bigger after scaffolding than before)? If not, how many reads did you start out with vs. how many got aligned? You can look at all the intermediate files by going into the work directory and then the first few characters of the directory for that step are in the nextflow output — for example, the alignment step's work directory should start with work/ca/a11554, so you can look there for the bam files and any error messages that step generated (in .command.err).

Hope this helps!

gargkritika commented 1 month ago

Hi I am also getting the same error. Were you able to resolve this issue?

Any help is appreciated.

Best Kritika

carla-hazelf commented 1 month ago

Hi, Sorry for delayed response- @esrice, thank you for your quick response at the time, it's really appreciated. @gargkritika personally, I was having issues with nextflow more generally, so I ended up doing it manually. I followed the description here; https://github.com/GenomicsAotearoa/High-quality-genomes/blob/main/Centrostephanus/Urchin_HiCScaffolding_V4.ipynb

So, mapping my reads to my assembly using bwa (-5SP is the option for Hi-C data). I then marked duplicates in the .sam file using samblaster. Converted it to BAM, and filtered out secondary alignments and unmapped reads; then further sorting by coordinates/read names for downstream Hi-C analyses. Then it was a matter of following the yahs protocol; https://github.com/c-zhou/yahs I'm no expert! But this worked for me. Hope this helps. If you're needing this nextflow pipeline, try following the suggestion above and see if it works for you-- I don't recall if I got around to trying it or not.