epi2me-labs / wf-metagenomics

Metagenomic classification of long-read sequencing data
Other
58 stars 23 forks source link

Unable to run pipeline on test or any other data #8

Closed Midnighter closed 1 year ago

Midnighter commented 2 years ago

I cannot successfully run the pipeline on my own or the test data. Both kraken2 and minimap2 fail to classify any sequences.

I downloaded the test data and confirmed that it contains 1000 reads. Then run the pipeline as follows:

#!/usr/bin/env bash

set -euo pipefail

export NXF_VER='21.10.6'

pipeline='https://github.com/epi2me-labs/wf-metagenomics'

nextflow pull $pipeline

nextflow -c 'local.config' run \
    $pipeline \
    -r 'master' \
    -params-file 'params.yml' \
    -resume

The local config just contains tower.enabled = false.

params.yml

fastq: "/.../data/reads.fastq.gz"
minimap2: true
kraken2: true
threads: 30

When I run like this, kraken2 is the first to fail.

Error executing process > 'pipeline:kraken2 (1)'                                                                                                                                              

Caused by:                                                                                                                                                                                    
  Missing output file(s) `*.classified.fastq` expected by process `pipeline:kraken2 (1)`                                                                                                      

Command executed:                                                                                                                                                                             

  kraken2         --db database_dir         --threads 30         --report reads.kraken2_report.txt         --classified-out reads.kraken2.classified.fastq         --unclassified-out reads.kr
aken2.unclassified.fastq         reads.fastq > reads.kraken2.assignments.tsv                                                                                                                  
  awk -F '\t' '{print $3}' reads.kraken2.assignments.tsv > taxids.tmp                                                                                                                         
  taxonkit         --data-dir taxonomy_dir         lineage -R taxids.tmp         | aggregate_lineages.py -p reads.kraken2                                                                     

Command exit status:                                                                                                                                                                          
  0                                                                                                                                                                                           

Command output:                                                                                                                                                                               
  (empty)                                                                                                                                                                                     

Command error:                                                                                                                                                                                
  Loading database information... done.                                                                                                                                                       
  0 sequences (0.00 Mbp) processed in 0.131s (0.0 Kseq/m, 0.00 Mbp/m).                                                                                                                        
    0 sequences classified (-nan%)                                                                                                                                                            
    0 sequences unclassified (-nan%)

When I set kraken2 to false, the minimap2 step completes but the report fails.

Error executing process > 'pipeline:makeReport'

Caused by:
  Process `pipeline:makeReport` terminated with an error exit status (1)

Command executed:

  report.py         wf-metagenomics-report.html         --versions versions         --params params.json         --summaries reads.stats         --lineages reads.minimap2.lineages.json         --vistempl report-visualisation.html

Command exit status:
  1

Command output:
  (empty)

Command error:
  Traceback (most recent call last):
    File "/home/moritz/.nextflow/assets/epi2me-labs/wf-metagenomics/bin/report.py", line 116, in <module>
      main()
    File "/home/moritz/.nextflow/assets/epi2me-labs/wf-metagenomics/bin/report.py", line 99, in main
      section=fastcat.full_report(
    File "/home/epi2melabs/conda/lib/python3.8/site-packages/aplanat/components/fastcat.py", line 140, in full_report
      read_length = read_length_plot(stats, min_len=min_len, max_len=max_len)
    File "/home/epi2melabs/conda/lib/python3.8/site-packages/aplanat/components/fastcat.py", line 29, in read_length_plot
      mean_length = total_bases / len(seq_summary)
  ZeroDivisionError: division by zero

This last error is a bit suspicious since the pipeline runs with the standard profile and thus Docker...

I also tried to run it with revision v1.1.4 and nextflow 22.04.3 same errors.

sarahjeeeze commented 2 years ago

Hi, I just ran it using the same set up and config files as you and could not recreate your error. Do you have the recommended amount of RAM available, you may need to adjust your docker settings to allow the recommended 8GB Memory. You can do this in docker desktop or maybe try using the conda or singularity profile.

Midnighter commented 2 years ago

I have 120 GB of RAM available, yes, that is not the issue. I'll try it some more.

magandBE commented 2 years ago

Hello, was this problem solved? I encountered the same issue while running the test data (but also with any other data). See file attached for more details about the error:

EPI2ME_error.txt

The error is obtained either while running EPI2ME through command lines or GUI. EPI2ME is running on a GridION with 64 GB of memory and docker has access to all the resources.

Could you please help to solve this issue? Thank you

sarahjeeeze commented 2 years ago

Thanks for letting us know, I am looking in to it.

sarahjeeeze commented 2 years ago

@magandBE Would you be able to try it with the parameter --source ncbi_16s_18s which has been tested with the test_data and is a smaller set. Trying to rule out that it's an issue with the data.

magandBE commented 2 years ago

@sarahjeeeze Thank you for your reply, I tried with the indicated parameter and this reproduced the same error.

sarahjeeeze commented 2 years ago

Hi, I have tried but been unable to recreate your error. I notice in your error is says Your local project version looks outdated - a different revision is available in the remote repository so perhaps try nextflow pull epi2me-labs/wf-metagenomics to update it and try and run it again.

magandBE commented 2 years ago

Hi, I have run nextflow pull epi2me-labs/wf-metagenomics and run again the workflow but the same error continues to appear. The version of the revision I used is v1.1.4 and this is the same that is pulled with nextflow pull

The same error is obtained all the time whatever the dataset/database/config file/ is used. For a run that we tried with the dataset /data/magand/gmstd_pure_Sciensano_HQ10-500.fastq and the database PlusPF-8 we have tried to look into more details in the temporary files what was the cause of the exciting error.
Apparently a file related to my dateset cannot be read: grid@GXB03465:/data/scratch/magand/tmp/a5/b9c63770483a597f107c9d47fd138d$ cat .command.err Processing gmstd_pure_Sciensano_HQ10-500/gmstd_pure_Sciensano_HQ10-500.fastq Warning: file 'gmstd_pure_Sciensano_HQ10-500/gmstd_pure_Sciensano_HQ10-500.fastq' cannot be read. Which is strange because when I look at my file, it is not empty and has read permission for everyone (see attached file). Moreover, I see that there are many links to my data that were created.

EPI2ME_invest_error.txt

sarahjeeeze commented 2 years ago

Hi @magandBE, Thanks for the information. It might be an issue with docker permissions. I would try following these instructions just the manage-docker-as-a-non-root-user section and then trying the workflow again.

magandBE commented 2 years ago

Hi @sarahjeeeze Thanks for your reply and sorry for the late answer. I followed the instructions to adapt the docker permission (which were already correct I think) but this still lead to the same error.

nggvs commented 1 year ago

Hi, Thank you for using the workflow. Could you confirm if this issue has been solved? We'll close this ticket on the assumption things are now resolved.