epi2me-labs / wf-transcriptomes

Other
64 stars 30 forks source link

Error in deAnalysis when doing 'reference guided' analysis #88

Open KatrinMoller opened 2 months ago

KatrinMoller commented 2 months ago

Operating System

Windows 10

Other Linux

No response

Workflow Version

v1.1.1

Workflow Execution

Command line

EPI2ME Version

No response

CLI command run

./nextflow run epi2me-labs/wf-transcriptomes \ -profile singularity \ --fastq /hpcdata/Mimir/shared/km100/all_libs \ --de_analysis \ --ref_genome Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz\ --ref_annotation Homo_sapiens.GRCh38.110.gtf \ --sample_sheet sample_sheet_short.csv \ --cdna_kit "SQK-PCS111" \ --isoform_table_nrows 10000 \ --out_dir output_short -w workspace_short\ --threads 64

Workflow Execution - CLI Execution Profile

singularity

What happened?

I have run this analysis with a ref_transcriptome successfully. But I wanted to try the reference guided version, as I am searching for a poorly annotated isoform. The run goes well until the differential_expression:map_transcriptome, then it started taking ages (up to 19hours per sample) which I guess could be because of the reference guided part, but perhaps worth taking a look at if the 64 CPUs are being used or not. It then runs again smoothly until the command deAnalysis and then it gives an error (see below), which looks like the transcript strand direction is missing? Could you help me solve this?

Relevant log output

The exit status of the task that caused the workflow execution to fail was: 1.

The full error message was:

Error executing process > 'pipeline:differential_expression:deAnalysis'

Caused by:
  Process `pipeline:differential_expression:deAnalysis` terminated with an error exit status (1)

Command executed:

  mkdir merged
  mkdir de_analysis
  de_analysis.R annotation.gtf 3 1 10 3

Command exit status:
  1

Command output:
  Loading counts, conditions and parameters.
  Checking annotation file type.
  Annotation file type is gtf.
  Checking annotation file for presence of transcript_id versions.
  Annotation file transcript_ids include versions.
  Loading annotation database.

Command error:
  Loading counts, conditions and parameters.
  Checking annotation file type.
  Annotation file type is gtf.
  Checking annotation file for presence of transcript_id versions.
  Annotation file transcript_ids include versions.
  Loading annotation database.
  Import genomic features from the file as a GRanges object ... OK
  Prepare the 'metadata' data frame ... OK
  Make the TxDb object ... Error in .makeTxDb_normarg_transcripts(transcripts) : 
    values in 'transcripts$tx_strand' must be "+" or "-"
  Calls: makeTxDbFromGFF ... makeTxDbFromGRanges -> makeTxDb -> .makeTxDb_normarg_transcripts
  In addition: Warning messages:
  1: In for (i in seq_along(defined)) { :
    closing unused connection 4 (annotation.gtf)
  2: In for (i in seq_along(defined)) { :
    closing unused connection 3 (annotation.gtf)
  Execution halted

Work dir:
  /hpcdata/Mimir/shared/km100/workspace_short/29/0042d7fa61343c1a1b1184c96ee1eb

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

Application activity log entry

No response

Were you able to successfully run the latest version of the workflow with the demo data?

yes

Other demo data information

No response

sarahjeeeze commented 2 months ago

Hi, how large are your input samples?? We have a few efficiency improvements on the way for the map transcriptome step which may help a bit but also 64 threads seems high this is per process not for the whole programme - how many does your system have in total, perhaps try 8 or 16?

We are investigating the second issue, thanks for reporting.

KatrinMoller commented 2 months ago

Hi @sarahjeeeze Thanks for looking into this My samples (6 in total) are between 30-50GB each. My system has 64 threads, so thats what I put in the initial command, is there a possibility to change this also for individual steps?

sarahjeeeze commented 1 month ago

Hi, yes you can set per process with the threads parameter - this sets it for any steps where adjusting threads should improve performance. But if you give one process all 64 threads it will slow the workflow as there wont be any left for other processes and also potentially steal all the memory. So i recommend 8/16max.

sarahjeeeze commented 1 month ago

Still looking in to the other issue

shelgueta commented 1 month ago

Hi, I ma getting the same error (here mentioned as second issue). Any news on that?

sarahjeeeze commented 3 weeks ago

Hi, sorry yes got a MR fix incoming for it, will let you know once its in pre-release. Sorry for the delay.