BrooksLabUCSC / flair

Full-Length Alternative Isoform analysis of RNA
Other
205 stars 71 forks source link

DRIMSeq output has same value for all replicates #234

Open mnsmar opened 1 year ago

mnsmar commented 1 year ago

Copy and paste the exact command you tried to run

flair diffexp --counts_matrix reads.flair.quantify  --threads 40  --out_dir reads.flair.diffexp --out_dir_force

How did you install Flair? v1.7.0 from bioconda (conda create -n flair -c conda-forge -c bioconda flair/1.7.0)

What happened? The output for DRIMSeq isoforms_drimseq_young_v_old.tsv has exactly the same value for all replicates.

feature_id      gene_id YF1_young_batch1_0      YF2_young_batch1_1      YM1_young_batch1_2      YM2_young_batch1_3      OF1_old_batch1_4        OF2_old_batch1_5        OM1_old_batch1_6        OM2_old_batch1_7        lr      adj_pvalue
ENSMUST00000165430_ENSMUSG00000000326   ENSMUSG00000000326      0.292   0.292   0.292   0.292   0.055   0.055   0.055   0.055   63.86   6.46e-12
b8552661-2178-4d9a-8afa-942bf878e01f_ENSMUSG00000058207 ENSMUSG00000058207      0.002   0.002   0.002   0.002   0.007   0.007   0.007   0.007   28.7    0.000205
e28e81da-ae07-4bc8-bc82-9fc75280d3d3_ENSMUSG00000066154 ENSMUSG00000066154      0.023   0.023   0.023   0.023   0.039   0.039   0.039   0.039   27.46   0.000258
ENSMUST00000087225_ENSMUSG00000026179   ENSMUSG00000026179      0.396   0.396   0.396   0.396   0.151   0.151   0.151   0.151   25.2    0.000499
ENSMUST00000113805_ENSMUSG00000026179   ENSMUSG00000026179      0.604   0.604   0.604   0.604   0.849   0.849   0.849   0.849   25.2    0.000499
ENSMUST00000000335_ENSMUSG00000000326   ENSMUSG00000000326      0.199   0.199   0.199   0.199   0.381   0.381   0.381   0.381   22.39   0.00179
0c934051-7688-4a51-b8a2-a7ff03fd6ca8_ENSMUSG00000056973 ENSMUSG00000056973      0.002   0.002   0.002   0.002   0.016   0.016   0.016   0.016   16.45   0.0346
ENSMUST00000068004_ENSMUSG00000024892   ENSMUSG00000024892      0.782   0.782   0.782   0.782   0.934   0.934   0.934   0.934   15.7    0.0399
ENSMUST00000113825_ENSMUSG00000024892   ENSMUSG00000024892      0.218   0.218   0.218   0.218   0.066   0.066   0.066   0.066   15.7    0.0399
a6d60f37-9dd2-40ef-aeef-0d6ddbbd12fd_ENSMUSG00000078675 ENSMUSG00000078675      0.383   0.383   0.383   0.383   0.283   0.283   0.283   0.283   15.36   0.0429

Here is the formula matrix used:

sample_id       condition
YF1_young_batch1_0      young
YF2_young_batch1_1      young
YM1_young_batch1_2      young
YM2_young_batch1_3      young
OF1_old_batch1_4        old
OF2_old_batch1_5        old
OM1_old_batch1_6        old
OM2_old_batch1_7        old

The intermediate input file to DRIMSeq workdir/filtered_iso_counts_drim.tsv appears to have different values for the replicates.

        gene_id feature_id      YF1_young_batch1_0      YF2_young_batch1_1      YM1_young_batch1_2      YM2_young_batch1_3      OF1_old_batch1_4        OF2_old_batch1_5        OM1_old_batch1_6        OM2_old_batch1_7
0       4:108339000     00000877-1787-4562-a00d-0dab79773b4f_4:108339000        1.0     0.0     1.0     0.0     1.0     0.0     1.0     0.0
1       7:16455000      0000cf99-b729-4ba3-be27-aaa8a80b5142_7:16455000 1.0     1.0     0.0     0.0     2.0     4.0     2.0     2.0
2       ENSMUSG00000039246      0002dc35-4356-4406-82bb-db5755c9c34f_ENSMUSG00000039246 0.0     0.0     3.0     0.0     0.0     3.0     2.0     0.0
3       ENSMUSG00000073988      00056748-d587-4b99-b7d1-9ed29904bd1e_ENSMUSG00000073988 6.0     3.0     4.0     3.0     1.0     5.0     13.0    2.0
4       ENSMUSG00000042165      0006d6d4-b725-4c3d-88e6-fc175799fe43_ENSMUSG00000042165 1.0     2.0     1.0     1.0     1.0     2.0     4.0     0.0
5       ENSMUSG00000058921      00098c70-bc3c-4925-b4d4-c2237d87303d_ENSMUSG00000058921 12.0    4.0     13.0    7.0     11.0    7.0     13.0    4.0
6       14:66044000     0009e9fa-fb24-4e92-a0e7-c2d30d583c36_14:66044000        4.0     4.0     7.0     1.0     2.0     2.0     5.0     6.0
...

Is that expected? -Thanks

Jeltje commented 1 year ago

Also #235

This is normal DRIMSeq behavior. DRIMSeq estimates transcript proportions per condition from the per-sample input values. Why it then outputs every sample instead of just the conditions is a question best answered by the authors...

I'll add the feature request label to this ticket; maybe we can reconfigure the output to make it less confusing.