dcjones / isolator

Rapid and robust analysis of RNA-Seq experiments.
MIT License
32 stars 7 forks source link

condition-transcript-expression question #6

Open steffenheyne opened 7 years ago

steffenheyne commented 7 years ago

Hi, thanks for isolator! I started playing around and it seems very useful!

I try to unterstand the different summarize functions.

My samples.yaml looks like this:

KO_young:
 RE_8wks_KO_01:  bam/RE_8wks_KO_01.bam
 RE_8wks_KO_04:  bam/RE_8wks_KO_04.bam
 RE_8wks_KO_05:  bam/RE_8wks_KO_05.bam

KO_old:
 RE_25wks_KO_02:  bam/RE_25wks_KO_02.bam
 RE_25wks_KO_03:  bam/RE_25wks_KO_03.bam

ctrl_young:
 RE_8wks_HET_01: bam/RE_8wks_HET_01.bam
 RE_8wks_HET_02: bam/RE_8wks_HET_02.bam
 RE_8wks_HET_03: bam/RE_8wks_HET_03.bam
 RE_8wks_WT_01: bam/RE_8wks_WT_01.bam
 RE_8wks_WT_03: bam/RE_8wks_WT_03.bam
 RE_8wks_WT_04: bam/RE_8wks_WT_04.bam

ctrl_old:
 RE_25wks_HET_01:  bam/RE_25wks_HET_01.bam
 RE_25wks_HET_02:  bam/RE_25wks_HET_02.bam
 RE_25wks_WT_02:  bam/RE_25wks_WT_02.bam
 RE_25wks_WT_03:  bam/RE_25wks_WT_03.bam
 RE_25wks_WT_04:  bam/RE_25wks_WT_04.bam

Now with isolator summarize condition-transcript-expression isolator-output.4_cond.h5

I get a file "condition-transcript-expression" starting with:

gene_name   gene_id transcript_id   KO_young_adjusted_tpm   KO_young_adjusted_tpm   KO_young_adjusted_tpm   KO_old_adjusted_tpm
mt-Tf   ENSMUSG00000064336.1    ENSMUST00000082387.1    3.459316e-02    5.277997e-02    3.047849e-02    2.762534e-02
mt-Rnr1 ENSMUSG00000064337.1    ENSMUST00000082388.1    7.140579e+01    1.003876e+02    7.329236e+01    6.638102e+01
mt-Tv   ENSMUSG00000064338.1    ENSMUST00000082389.1    5.484299e-03    7.384523e-03    6.614153e-03    4.906841e-03
...

What are the columns 4-7? Why 3x the same column name? I would expect my 4 different conditions in the header, or?

Each column is the "mean" expression value of one condition?

What is the best way to get a "mean" expression per condition in a way that it matches (or something close with some simple approx.) the expression used to get "median_log2_fold_change" from a "differential-transcript-expression.tsv" file?

Thanks!