fmalmeida / ngs-preprocess

A pipeline for preprocessing NGS data from Illumina, Nanopore and PacBio technologies
https://ngs-preprocess.readthedocs.io/
GNU General Public License v3.0
30 stars 4 forks source link

SRA NBCI fetch and preprossing script only works automatically for illumina sequences #40

Closed 0karl0 closed 1 month ago

0karl0 commented 2 months ago

using fmalmeida/ngs-preprocess:v2.6

nextflow should identify sequencing platform and route preprocessing to nanopore/pacbio/illumina.


ERROR ~ Error executing process > 'SRA_FETCH:GET_FASTQ (SRR9641620)'

Caused by:
  Process `SRA_FETCH:GET_FASTQ (SRR9641620)` terminated with an error exit status (3)

Command executed:

  fasterq-dump \
    --include-technical \
    --split-files \
    --threads 2 \
    --outdir ./SRR9641620_data \
    --progress \
    SRR9641620

Command exit status:
  3

Command output:
  (empty)

Command error:
  2024-05-09T13:37:02 fasterq-dump.3.0.3 err: accession 'SRR9641620' is PACBIO, please use fastq-dump instead
  fasterq-dump quit with error code 3
fmalmeida commented 2 months ago

Hi @0karl0 , Thanks for using the pipeline.

This was an error present in v2.6 specifically to PACBIO data.

This issue should not be present in the newest version 2.7.

Can you try the latest version of the pipeline and let me know? If it persists I can check it further.

nextflow run fmalmeida/ngs-preprocess -r master -latest

Cheers, Felipe.

0karl0 commented 2 months ago

Thank you

I was using v2.6 due to this warning:

nextflow run fmalmeida/ngs-preprocess -profile docker --sra_ids "./input/sra_ids.txt" --lreads_min_length 750 --output "./preprocessed_data"
N E X T F L O W  ~  version 23.10.1
Project `fmalmeida/ngs-preprocess` is currently stickied on revision: v2.6 -- you need to explicitly specify a revision with the option `-r` in order to use it

I've pulled the latest docker pull fmalmeida/ngs-preprocess:latest

Then incorporated this into the example code provided in the README:

nextflow run fmalmeida/ngs-preprocess -r master -latest -profile docker --sra_ids "./input/sra_ids.txt" --lreads_min_length 750 --output "./preprocessed_data"

This runs further but crashes during the nanoQC command, looks like

 # Checking Quality
  nanoQC \
      -o NanoQC \
      SRR9641619_15.fastq SRR9641619_10.fastq SRR9641619_11.fastq SRR9641619_16.fastq SRR9641619_21.fastq SRR9641619_5.fastq SRR9641619_1.fastq SRR9641619_12.fastq SRR9641619_18.fastq SRR9641619_7.fastq SRR9641619_22.fastq SRR9641619_25.fastq SRR9641619_17.fastq SRR9641619_6.fastq SRR9641619_28.fastq SRR9641619_19.fastq SRR9641619_4.fastq SRR9641619_20.fastq SRR9641619_29.fastq SRR9641619_24.fastq SRR9641619_13.fastq SRR9641619_3.fastq SRR9641619_9.fastq SRR9641619_31.fastq SRR9641619_23.fastq SRR9641619_14.fastq SRR9641619_26.fastq SRR9641619_2.fastq SRR9641619_8.fastq SRR9641619_30.fastq SRR9641619_27.fastq ;

  # Generate Statistics Summary
  NanoStat \
      --fastq SRR9641619_15.fastq SRR9641619_10.fastq SRR9641619_11.fastq SRR9641619_16.fastq SRR9641619_21.fastq SRR9641619_5.fastq SRR9641619_1.fastq SRR9641619_12.fastq SRR9641619_18.fastq SRR9641619_7.fastq SRR9641619_22.fastq SRR9641619_25.fastq SRR9641619_17.fastq SRR9641619_6.fastq SRR9641619_28.fastq SRR9641619_19.fastq SRR9641619_4.fastq SRR9641619_20.fastq SRR9641619_29.fastq SRR9641619_24.fastq SRR9641619_13.fastq SRR9641619_3.fastq SRR9641619_9.fastq SRR9641619_31.fastq SRR9641619_23.fastq SRR9641619_14.fastq SRR9641619_26.fastq SRR9641619_2.fastq SRR9641619_8.fastq SRR9641619_30.fastq SRR9641619_27.fastq \
      -t 4 \
      -n SRR9641619.txt \
      --outdir NanoStats ;

Command exit status:
  2

Command output:
  WARNING: hex as part of --plots has been deprecated and will be ignored. To get the hex output, rerun with --legacy hex.

Command error:
  WARNING: hex as part of --plots has been deprecated and will be ignored. To get the hex output, rerun with --legacy hex.
  usage: nanoQC [-h] [-v] [-o OUTDIR] [--rna] [-l MINLEN] fastq
  nanoQC: error: unrecognized arguments: SRR9641619_10.fastq SRR9641619_11.fastq SRR9641619_16.fastq SRR9641619_21.fastq SRR9641619_5.fastq SRR9641619_1.fastq SRR9641619_12.fastq SRR9641619_18.fastq SRR9641619_7.fastq SRR9641619_22.fastq SRR9641619_25.fastq SRR9641619_17.fastq SRR9641619_6.fastq SRR9641619_28.fastq SRR9641619_19.fastq SRR9641619_4.fastq SRR9641619_20.fastq SRR9641619_29.fastq SRR9641619_24.fastq SRR9641619_13.fastq SRR9641619_3.fastq SRR9641619_9.fastq SRR9641619_31.fastq SRR9641619_23.fastq SRR9641619_14.fastq SRR9641619_26.fastq SRR9641619_2.fastq SRR9641619_8.fastq SRR9641619_30.fastq SRR9641619_27.fastq

usage from NanoQC says it wants the fastq argument in fastq.gz format.

nanoQC [-h] [-v] [-o OUTDIR] fastq

positional arguments:
  fastq                 Reads data in fastq.gz format.

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         Print version and exit.
  -o, --outdir OUTDIR   Specify directory in which output has to be created.
  -l, --minlen int      Minimum length of reads to be included in the plots
                        This also controls the length plotted in the graphs
                        from the beginning and end of reads (length plotted = minlen / 2)
0karl0 commented 2 months ago

alternatively may just need to add the argument: "fastq" before the list of fastq files

fmalmeida commented 2 months ago

Hi @0karl0 , Hmm, interesting. That is something I can work on.

I will fix that as soon as I can and get back to you when ready. Thanks for reporting.

fmalmeida commented 2 months ago

Hi @0karl0 , I have created a PoC for the fix. Can you test the following branch?

Note that "-r" is what allows you to run a specific version (e.g. -r v2.7.0) or a specific branch (e.g. -r master; -r dev)

nextflow \
    run fmalmeida/ngs-preprocess \
    -r dev \
    -latest \
    -profile docker \
    --sra_ids "./input/sra_ids.txt" \
    --lreads_min_length 750 \
    --output "./preprocessed_data"

If that works fine, please let me know so I can work on merging it to the pipeline code in order to make a new release, v2.7.1.

0karl0 commented 1 month ago

I had to run some other code over the weekend but I wanted to thank you for the quick fix here. Works for me.

Thanks! Karl

nextflow     run fmalmeida/ngs-preprocess     -r dev     -latest     -profile docker     --sra_ids "./input/sra_ids.txt"     --lreads_min_length 750     --output "./preprocessed_data"
N E X T F L O W  ~  version 23.10.1
Pulling fmalmeida/ngs-preprocess ...
 checkout-out at 61b2aab4827693e8af4a8af7e0239d6a33752f81
Launching `https://github.com/fmalmeida/ngs-preprocess` [cheeky_koch] DSL2 - revision: 61b2aab482 [dev]

------------------------------------------------------
  fmalmeida/ngs-preprocess v2.7.0
------------------------------------------------------

Input/output options
  output           : ./preprocessed_data
  sra_ids          : ./input/sra_ids.txt

Long reads parameters
  lreads_min_length: 750

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use fmalmeida/ngs-preprocess for your analysis please cite:

* The pipeline
  https://doi.org/10.12688/f1000research.139488.1

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
  https://github.com/fmalmeida/ngs-preprocess#citation
------------------------------------------------------
executor >  local (11)
[53/e5251d] process > SRA_FETCH:GET_FASTQ (SRR9641620)    [100%] 3 of 3 ✔
[2f/ad6a32] process > SRA_FETCH:GET_METADATA (SRR9641620) [100%] 3 of 3 ✔
[-        ] process > NANOPORE:PORECHOP                   -
[-        ] process > NANOPORE:FILTER                     -
[-        ] process > NANOPORE:NANOPACK                   -
[-        ] process > PACBIO:BAM2FASTQ                    -
[54/07888c] process > PACBIO:NANOPACK (SRR9641620)        [100%] 2 of 2 ✔
[c1/959adc] process > PACBIO:FILTER (SRR9641620)          [100%] 2 of 2 ✔
[55/94a3b8] process > ILLUMINA:FASTP (SRR9641621)         [100%] 1 of 1 ✔

Pipeline completed at: 2024-05-13T14:52:32.747676151-04:00
Execution status: OK
Execution duration: 38m 53s
Thank you for using fmalmeida/ngs-preprocess pipeline!
Completed at: 13-May-2024 14:52:33
Duration    : 38m 53s
CPU hours   : 3.4
Succeeded   : 11
fmalmeida commented 1 month ago

Thanks for confirming. I just made a release: https://github.com/fmalmeida/ngs-preprocess/releases/tag/v2.7.1

Cheers.