epi2me-labs / wf-single-cell

Other
70 stars 38 forks source link

[Bug]: #1

Closed ljmorris closed 1 year ago

ljmorris commented 2 years ago

What happened?

Hi, the epi2me-labs/wf-single-cell workflow is failing for me on a large fastq fle (130GB).

See log output from nextflow report.html below.

I tried running seqkit split separately from the command line with this FASTQ file but it hangs for a very long time. I saw in the documentation that seqkit split2 is generally used for FASTQ files and seqkit split is mainly for FASTA so I tried split2 and it generated the output in 10 minutes. Can the issue be resolved by using seqkit split2 instead?

Operating System

ubuntu 18.04

Workflow Execution

Command line

Workflow Execution - EPI2ME Labs Versions

No response

Workflow Execution - Execution Profile

Conda

Workflow Version

N E X T F L O W ~ version 20.10.0 Launching epi2me-labs/wf-single-cell [golden_mestorf] - revision: 7a5a2d3782 [master]

Relevant log output

Workflow execution completed unsuccessfully!
The exit status of the task that caused the workflow execution to fail was: 137.

The full error message was:

Error executing process > 'pipeline:stranding:chunk_files (1)'

Caused by:
  Process `pipeline:stranding:chunk_files (1)` terminated with an error exit status (137)

Command executed:

  seqkit split merged.fastq -p "4" -O chunks

Command exit status:
  137

Command output:
  (empty)

Command error:
  [INFO] split into 4 parts
  [INFO] read sequences ...
  .command.sh: line 2: 19005 Killed                  seqkit split merged.fastq -p "4" -O chunks
nrhorner commented 2 years ago

Hi @ljmorris,

Sorry that the workflow is not working for you at the moment. Your suggestion of using split2 sounds good. I'm taking a look at that now. I'll get back to you shortly.

nrhorner commented 2 years ago

@ljmorris I've put a fix in to use split2 as you suggested. If you could try it out and see if it sorts out your issue that would be great.

The prerelease can be run like this: nextflow run epi2me-labs/wf-single-cell -r prerelease

ljmorris commented 2 years ago

Hi @nrhorner, Thanks very much for your quick response. I'm trying it out now and will let you know how it goes.

ljmorris commented 2 years ago

Hi @nrhorner, I tried the workflow using -r prerelease, but I ran into the same issue (Caused by: Process pipeline:stranding:chunk_files (1) terminated with an error exit status (137) Command executed:seqkit split merged.fastq -p "4" -O chunks). I do have a local version of the repository but it's in a different directory and I'm pretty sure this run is using the remote prerelease if I look at the .nextflow.log directory:

Launching epi2me-labs/wf-single-cell [gigantic_swirles] - revision: c2e641b127 [prerelease] NOTE: Your local project version looks outdated - a different revision is available in the remote repository [aaa98b544c]

I'm a little confused as I don't see seqkit used in stranding.nf anymore in the prerelease branch, but maybe I don't yet understand how nextflow works with GitHub.

nrhorner commented 2 years ago

Hi @ljmorris I removed the stranding.chunk_files process() and put concatenation of read file and chunking into main.summariseCatChunkReads() So it looks like it's not using the updated prerelease.

Could you try a next pull epi2me-labs/wf-single-cell please?

nrhorner commented 2 years ago

Could you try a next pull epi2me-labs/wf-single-cell please?

And it should return - revision: aaa98b544c [prerelease]

ljmorris commented 2 years ago

Hi @nrhorner, I tried next pull epi2me-labs/wf-single-cell but it's giving an error:

Checking epi2me-labs/wf-single-cell ... Project config file is malformed -- Cause: Compile failed for sources FixedSetSources[name='/groovy/script/Script0BE2D8A16E788BF54B56B6751CD9B385/_nf_config_7bffd562']. Cause: org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed: /groovy/script/Script0BE2D8A16E788BF54B56B6751CD9B385/_nf_config_7bffd562: 13: Unexpected input: '{' @ line 13, column 8. params { ^

1 error

nrhorner commented 2 years ago

@ljmorris Could you try deleting ~/.nextflow/assets/epi2me-labs/wf-single-cell And then running the workflow

ljmorris commented 2 years ago

Hi @nrhorner, I don't have a .nextflow/assets directory. There is only .nextflow/cache and a history file. I tried also pulling another workflow just in case there is something odd about my setup (I'm using conda).

If I do nextflow pull epi2me-labs/wf-isoforms it works fine, revision: 484af65e2f is downloaded. But pull epi2me-labs/wf-single-cell still gives the - Project config file is malformed error.

nrhorner commented 2 years ago

@ljmorris Could you try deleting the .nextflow/cache please?

ljmorris commented 2 years ago

@nrhorner I deleted the .nextflow/cache then did the nextflow pull epi2me-labs/wf-single-cell, but still get the same error.

nrhorner commented 2 years ago

@ljmorris are you running this from the same directory as the cloned github repo. If so, try running from another directory

ljmorris commented 2 years ago

@nrhorner I'm already running it from another directory.

nrhorner commented 2 years ago

@ljmorris Could you post the output of nextflow info epi2me-labs/wf-single-cell

ljmorris commented 2 years ago

@nrhorner nextflow info epi2me-labs/wf-single-cell Project config file is malformed -- Cause: Compile failed for sources FixedSetSources[name='/groovy/script/Script0BE2D8A16E788BF54B56B6751CD9B385/_nf_config_7bffd562']. Cause: org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed: /groovy/script/Script0BE2D8A16E788BF54B56B6751CD9B385/_nf_config_7bffd562: 13: Unexpected input: '{' @ line 13, column 8. params { ^

1 error

nrhorner commented 2 years ago

@ljmorris If you can't do info on wf-single-cell because it's broken. Try nextflow info epi2me-labs/wf-isoforms and post the output of that

nrhorner commented 2 years ago

@ljmorris also please try nextflow drop epi2me-labs/wf-single-cell

ljmorris commented 2 years ago

@nrhorner nextflow info epi2me-labs/wf-isoforms project name: epi2me-labs/wf-isoforms repository : https://github.com/epi2me-labs/wf-isoforms local path : /mnt/DataRAID/morrisl/.nextflow/assets/epi2me-labs/wf-isoforms main script : main.nf description : RNA/cDNA isoform analysis workflow author : Oxford Nanopore Technologies revisions :

ljmorris commented 2 years ago

But I still get the same error with nextflow drop: nextflow drop epi2me-labs/wf-single-cell Project config file is malformed -- Cause: Compile failed for sources FixedSetSources[name='/groovy/script/Script0BE2D8A16E788BF54B56B6751CD9B385/_nf_config_7bffd562']. Cause: org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed: /groovy/script/Script0BE2D8A16E788BF54B56B6751CD9B385/_nf_config_7bffd562: 13: Unexpected input: '{' @ line 13, column 8. params { ^

1 error

nrhorner commented 2 years ago

@ljmorris thanks for that. Could you delete this folder: /mnt/DataRAID/morrisl/.nextflow/assets/epi2me-labs/ and try the pull again please?

ljmorris commented 2 years ago

@nrhorner Yes, that's fixed it! Thank you. I will try running the workflow now. nextflow pull epi2me-labs/wf-single-cell Checking epi2me-labs/wf-single-cell ... downloaded from https://github.com/epi2me-labs/wf-single-cell.git - revision: aaa98b544c [prerelease]

ljmorris commented 2 years ago

Hi @nrhorner The workflow completes successfully now taking 35 mins for pipeline:summariseCatChunkReads on my large FASTQ file using: nextflow run epi2me-labs/wf-single-cell -r prerelease. Thanks for the fix. How do I run the entire workflow now?

nrhorner commented 2 years ago

This should run the entire workflow. Are not all the other processes running?

ljmorris commented 2 years ago

No ,although it completes successfully the entire wf isn't run. There are 8 tasks - task 7 is pipeline:summariseCatChunkReads and then the final task is the output report, so the rest of the pipeline - alignment, processing the BAM files isn't run.

nrhorner commented 2 years ago

@ljmorris Could you post your .nextlfow.log please

ljmorris commented 2 years ago

@nrhorner Here it is nextflow.log

ljmorris commented 2 years ago

I have 2 sub directories in the output directory - execution and workspace. The execution dir contains the report.hml, timeline.html and trace.txt. The report.html says "Workflow execution completed successfully" . Then there are links to subdirectories in the workspace directory and in one of these the chunked fastq files. In the report.html there are only links to those 8 tasks and nothing later in the workflow. I could run it again with another fastq file that definitely worked with the snakemake version of the workflow.

On Fri, Sep 2, 2022 at 7:33 PM Neil Horner @.***> wrote:

It looks like all proceses were running. Please note report generation will happen before the workflow finishes as it's just a placeholder at the moment until I get round to making one. What's in your output folder?

— Reply to this email directly, view it on GitHub https://github.com/epi2me-labs/wf-single-cell/issues/1#issuecomment-1235751459, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFIAXTEZ6ERBG4REYBJJVLV4I26JANCNFSM6AAAAAAQAKPPI4 . You are receiving this because you were mentioned.Message ID: @.***>

nrhorner commented 2 years ago

I think I know what the problem might be. Please make sure that sample_id in your single_cell_sample_sheet is merged

As you use a single file for for your fastq input,

/mnt/wde12tblaksa/agata/singlecell_10x/c8_1/merged.fq.gz

the sample_id merged is taken from that. That's not clear in the README. I'll fix that and put a check in to enforce sample_ids in the sample_sheet and data match.

ljmorris commented 2 years ago

Thanks @nrhorner, This time the workflow continues to the adapter scanning , but the processes are aborted so the pipeline completed unsuccessfully. I checked the nextflow log and I found: ~> Sep-05 21:42:34.893 [Task monitor] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 4 -- submitted tasks are shown below TaskHandler[id: 9; name: pipeline:stranding:call_adapter_scan (1); status: RUNNING; exit: -; error: -; workDir: ... ... Sep-05 21:46:49.207 [SIGHUP handler] DEBUG nextflow.Session - Session aborted -- Cause: SIGHUP

Have you seen this before? And do you have any ideas how to fix it? The adapter scan was running for quite a long time (about 8 hours) before it was aborted so I can try again using more threads.

nrhorner commented 2 years ago

Hi @ljmorris. Maybe you lost connection with the server? It looks like you are using only 4 threads as well. Can you confirm what max_threads is set to in the config. Also see the global resource settings in the executor section


    $local {
        cpus = 4
        memory = "8 GB"
    }
}```
ljmorris commented 2 years ago

Hi @nrhorner, I did get disconnected from the server but when I ssh'ed again I think it was still running. I only set the resources_mm2_max_threads to 24 threads, so for the adapter scan I was just using the default of 4 threads. Where do I change the global resource settings?

nrhorner commented 2 years ago

@ljmorris this con be found in the config, which will be nextflow.config unless you choose another with -c

    $local {
        cpus = 4
        memory = "8 GB"
    }
}
ljmorris commented 2 years ago

Thanks @nrhorner. I've tried to run it again increasing the cpus and the max_threads. But I'm not able to run the prerelease at the moment. I'm getting:

Project epi2me-labs/wf-single-cell contains uncommitted changes -- Cannot switch to revision: prerelease

ljmorris commented 2 years ago

@nrhorner it's running now. I had to remove the assets directory - still getting used to nextflow and the way it works with GitHub.

ljmorris commented 2 years ago

Hi @nrhorner, the adapter scan completes now with the extra threads. I'm now back to the original error I saw with the snakemake version of the workflow which fails (for this genome) in the assign_barcodes task. I think I know what the problem is now as the nextflow error log is quite helpful. I'll post it as a separate issue.