Closed Mememe231 closed 7 months ago
Hi @Mememe231
Thanks for your request. We will consider adding your request into a future release.
Hi @Mememe231
Actually the table you're after should already exist in out_dir/de_analysis/de_tpm_transcript_counts.tsv
Please let me know if you can find it there or not
Is this a new feature?
My "de_analysis" folder only contains:
results_dtu.pdf
dtu_plots.pdf
results_dexseq.tsv
results_dge.pdf
results_dge.tsv
results_dtu_gene.tsv
results_dtu_stageR.tsv
results_dtu_transcript.tsv
That was run using epi2melabs 5.0.2 and wf-transcriptomes v0.2.1.
Thanks.
You should get the required output if you use the latest version 0.4.1
Yup, I see them now. Thanks.
Is your feature related to a problem?
When running DE, wf-transcriptomes-report.html produces a nifty table under "Transcripts Per Million".
This table is not readily available. It can not be exported from the html report. One could to pool TPMs from individual samples, which are in individual csv files (output/bXX_gffcompare folders/str_merged.transcripts_bXX.gff.tmap files) but sorting those into a useful table rapidly becomes complicated with many samples/barcodes, and impossible to use when comparing between different datasets.
Describe the solution you'd like
Output a .csv file that pools the TPMs of each transcript from each sample/barcode, putting 0 value when it is not expressed in a given sample.
Row one should contains all transcripts names from the reference annotation (GTF) or the reference guided annotation, even those that are not expressed. Users will then be able to pool their own multi-dataset tables, sort and filter as well as perform the math they require.
Describe alternatives you've considered
Using a small dataset (6 samples), I have copied all the TPMs into an Excel spreadsheet.
Unfortunately sorting/filtering is not possible unless each sample contain a row for each ref_gene_id. Individual csv files only contains TPMs for expressed ref_gene_id. Having a 0 value for all ref_gene_id would help.
I have tried to come up with a function to parse each TPM table so that they each contain all the ref_gene_id, adding 0s since they are not expressed, but that is complicated, and would be even more complicated to do with multiple datasets since they might not have the same # of samples.
Additional context
wf-transcriptomes can only compare 2 conditions, with a minimum of 3 replicates.
Ideally we could have the option to use 3 or more conditions (ex.: a sample for a series of timepoints: 0, 1h, 2h, 3h, etc), as well as set the number of "replicates" to 1 or more.
Since the math is done by another tool, this might be not be possible.
A workaround would be to use the TPMs from individual samples that are already generated by wf-transcriptomes, but pooling those in a useful table is complex.