epi2me-labs / wf-transcriptomes

Other
64 stars 30 forks source link

TPM table for all transcripts from all samples please! #29

Closed Mememe231 closed 7 months ago

Mememe231 commented 10 months ago

Is your feature related to a problem?

When running DE, wf-transcriptomes-report.html produces a nifty table under "Transcripts Per Million".

Capture d’écran, le 2023-09-01 à 12 40 01

This table is not readily available. It can not be exported from the html report. One could to pool TPMs from individual samples, which are in individual csv files (output/bXX_gffcompare folders/str_merged.transcripts_bXX.gff.tmap files) but sorting those into a useful table rapidly becomes complicated with many samples/barcodes, and impossible to use when comparing between different datasets.

Describe the solution you'd like

Output a .csv file that pools the TPMs of each transcript from each sample/barcode, putting 0 value when it is not expressed in a given sample.

Row one should contains all transcripts names from the reference annotation (GTF) or the reference guided annotation, even those that are not expressed. Users will then be able to pool their own multi-dataset tables, sort and filter as well as perform the math they require.

Describe alternatives you've considered

Using a small dataset (6 samples), I have copied all the TPMs into an Excel spreadsheet.

Unfortunately sorting/filtering is not possible unless each sample contain a row for each ref_gene_id. Individual csv files only contains TPMs for expressed ref_gene_id. Having a 0 value for all ref_gene_id would help.

I have tried to come up with a function to parse each TPM table so that they each contain all the ref_gene_id, adding 0s since they are not expressed, but that is complicated, and would be even more complicated to do with multiple datasets since they might not have the same # of samples.

Additional context

wf-transcriptomes can only compare 2 conditions, with a minimum of 3 replicates.

Ideally we could have the option to use 3 or more conditions (ex.: a sample for a series of timepoints: 0, 1h, 2h, 3h, etc), as well as set the number of "replicates" to 1 or more.

Since the math is done by another tool, this might be not be possible.

A workaround would be to use the TPMs from individual samples that are already generated by wf-transcriptomes, but pooling those in a useful table is complex.

nrhorner commented 10 months ago

Hi @Mememe231

Thanks for your request. We will consider adding your request into a future release.

nrhorner commented 8 months ago

Hi @Mememe231

Actually the table you're after should already exist in out_dir/de_analysis/de_tpm_transcript_counts.tsv Please let me know if you can find it there or not

Mememe231 commented 8 months ago

Is this a new feature?

My "de_analysis" folder only contains:

results_dtu.pdf  
dtu_plots.pdf
results_dexseq.tsv
results_dge.pdf 
results_dge.tsv
results_dtu_gene.tsv
results_dtu_stageR.tsv
results_dtu_transcript.tsv

That was run using epi2melabs 5.0.2 and wf-transcriptomes v0.2.1.

Thanks.

nrhorner commented 8 months ago

You should get the required output if you use the latest version 0.4.1

Mememe231 commented 7 months ago

Yup, I see them now. Thanks.