epi2me-labs / wf-transcriptomes

Other
71 stars 31 forks source link

More than 2 conditions in the sample sheet #94

Open CWYuan08 opened 3 months ago

CWYuan08 commented 3 months ago

Ask away!

Hi, if I want to do DE analysis with all my samples where there are more than 2 conditions in the condition column of the sample sheet, how could this be set up?
Many thanks.

sarahjeeeze commented 3 months ago

Hi, currently the workflow is not set up for more than 2 conditions, you could try to add more than 2 conditions in the sample sheet it will error unless you edit the check_sample_sheet_condition.py python script in the bin folder. If you just want the counts files for downstream analysis? You can edit the that script and it might output the counts in the de_analysis process but i haven't tested this and downstream of that it will likely error.

CWYuan08 commented 3 months ago

HI @sarahjeeeze, many thanks for your suggestion. Yeah having the counts will be good if I do the downstream analysis elsewhere. What should I change exacly in check_sample_sheet_condition.py for this? Best, CW

sarahjeeeze commented 3 months ago

Hey, you could try commenting out the line checkSampleSheetCondition(sample_sheet) in subworkflows/differntial_expression.nf

CWYuan08 commented 3 months ago

Hi @sarahjeeeze, thank you. I have a new error:

ERROR ~ Error executing process > 'pipeline:differential_expression:deAnalysis (1)'

Caused by: Process pipeline:differential_expression:deAnalysis (1) terminated with an error exit status (1)

Command executed:

mkdir merged mkdir de_analysis de_analysis.R annotation.gtf 3 1 10 3

Command exit status: 1

Command output: Loading counts, conditions and parameters. Checking annotation file type. Annotation file type is gtf. Checking annotation file for presence of transcript_id versions. Annotation file transcript_ids include versions. Loading annotation database.

Command error: Loading counts, conditions and parameters. Checking annotation file type. Annotation file type is gtf. Checking annotation file for presence of transcript_id versions. Annotation file transcript_ids include versions. Loading annotation database. Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... Error in .makeTxDb_normarg_transcripts(transcripts) : values in 'transcripts$tx_strand' must be "+" or "-" Calls: makeTxDbFromGFF ... makeTxDbFromGRanges -> makeTxDb -> .makeTxDb_normarg_transcripts In addition: Warning messages: 1: In for (i in seq_along(defined)) { : closing unused connection 4 (annotation.gtf) 2: In for (i in seq_along(defined)) { : closing unused connection 3 (annotation.gtf) Execution halted

But my reference annotation is a gtf from ensembl. I thought the standard ensembl annotation gtf has strands in + or -. How could i resolve this?

Many thanks! Best, CW

sarahjeeeze commented 2 months ago

What version of the workflow are you using, update to the latest which should of resolved this issue.