epi2me-labs / wf-transcriptomes

Other
64 stars 30 forks source link

Non-zero exit status at DE analysis stage #33

Closed eiwai81 closed 9 months ago

eiwai81 commented 9 months ago

Operating System

Other Linux (please specify below)

Other Linux

No response

Workflow Version

0.4.0

Workflow Execution

Command line

EPI2ME Version

No response

CLI command run

nextflow run ${WF_TRANSCRIPTOME} \ -profile singularity --threads 4 \ --fastq ${FASTQ} \ --transcriptome_source precomputed \ --de_analysis \ --ref_genome ${REF} \ --ref_annotation ${ANNOTATION} \ --minimap2_index_opts '-k 15' --pychopper_opts '-k PCS111' \ --ref_transcriptome ${TRANSCRIPTOME} \ --sample_sheet ${SAMPLE_SHEET} \ --out_dir ${OUTDIR} -w ${OUTDIR}/workspace_dir

Workflow Execution - CLI Execution Profile

singularity

What happened?

I am trying to run this workflow on real data from a cDNA sequencing data but it always errors out at the differential expression analysis step. I have tried both precomputed and reference-aided options and have also tried both the GTF and GFF file of my organism to see if that will make a difference but these do not seem to have an effect.

Relevant log output

ERROR ~ Error executing process > 'pipeline:differential_expression:deAnalysis'

Caused by:
  Process `pipeline:differential_expression:deAnalysis` terminated with an error exit status (1)

Command executed:

  mkdir merged
  mkdir de_analysis
  mv de_transcript_counts.tsv merged/all_counts.tsv
  mv sample_sheet.csv de_analysis/coldata.tsv
  de_analysis.R annotation.gtf 3 1 10 3 gtf true

Command exit status:
  1

Command output:
  Loading counts, conditions and parameters.
  Loading annotation database.

Command error:
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
  Loading counts, conditions and parameters.
  Loading annotation database.
  Import genomic features from the file as a GRanges object ... OK
  Prepare the 'metadata' data frame ... OK
  Make the TxDb object ... Error in .extract_genes_from_gff3_GRanges(gene_IDX, tx_IDX, mcols0$ID,  : 
    some genes have no "ID" attribute
  Calls: makeTxDbFromGFF ... makeTxDbFromGRanges -> .extract_genes_from_gff3_GRanges
  In addition: Warning messages:
  1: In .extract_transcripts_from_GRanges(tx_IDX, gr, mcols0$type, mcols0$ID,  :
    some transcripts have no "Name" attribute ==> their name ("tx_name"
    column in the TxDb object) was set to NA
  2: In .extract_transcripts_from_GRanges(tx_IDX, gr, mcols0$type, mcols0$ID,  :
    the transcript names ("tx_name" column in the TxDb object) imported
    from the "Name" attribute are not unique
  Execution halted

Application activity log entry

No response

nrhorner commented 9 months ago

Hi @eiwai81

Sorry that your having issue with the workflow. Which refernce annotation are you using? Is it publicly available to have a look at?

eiwai81 commented 9 months ago

Hello @nrhorner

Many thanks for responding. The original annotation file that accompanied the genome assembly was in GFF format. I used gffread to convert it into GTF. I am working with a plant genome which is publicly available but it could be easier for me to just send you the original GFF file. I could also point you in the direction of the genome itself if that would be useful.

eiwai81 commented 9 months ago

I will be closing this due to lack of response.

eiwai81 commented 9 months ago

Closed