model contrasts - Githubissues

abrozzi commented 4 years ago

Hi Constantin! And thank you for proDA!

I am extehnsively using it to find differentially expressed proteins in two different genetic backgrounds wt and ko cultured in two different media X and Y. I have different biological preparations (A-O) and technical replicates (single, 2 or 3).

Thesample_info_df looks like this:

prep    geno    culture  replicate
A    wt   X    1
A    wt   X    2
B    wt   X    1
C    ko   X    1
C    ko   X    2
D    wt   Y    1
D    wt   Y    2
E    wt   Y    1
E    wt   Y    2
F    ko   Y    1
F    ko   Y    2
G    wt   Y    1
G    wt   Y    2
G    wt   Y    3
H    wt   Y    1
H    wt   Y    2
H    wt   Y    3
I    ko   Y    1
I    ko   Y    2
I    ko   Y    3
L    ko   Y    1
L    ko   Y    2
L    ko   Y    3
M    ko   X    1
N    wt   X    1
O    ko   X    1
O    ko   X    2

To find differentially expressed proteins between wt and ko, accounting for the effect of the culturing medium, I do:

design_mat <- model.matrix(~ geno + culture, data=sample_info_df)

fit <- proDA(normalized_abundance_matrix, design = design_mat, col_data = sample_info_df, reference_level = "wt")

test_diff(fit, contrast = "genowt")

Am I correctly interpreting the model implemented in proDA? Are the replicates in this case modelled as random effects?

I thank you very much for any help you can give me, Alex

const-ae commented 4 years ago

Hi Alex,

in general your model looks good. You could actually simplify it, by skipping the model.matrix() call:

 fit <-proDA(normalized_abundance_matrix, design = ~ geno + culture, 
             reference_level = "wt", col_data = sample_info_df)
result_names(fit)
test_diff(fit, contrast = "genoko")

Are the replicates in this case modelled as random effects?

Good question, the problem is that I never really know what is exactly mean with random effect (see this blog post by Andrew Gelman on the conflicting definitions).

I would say geno is your coefficient of interest, you condition on the coefficient culture and the replicates (for example sample 1 and 2 in sample_info_df) are just considered that: independent draws from presumably the same distribution.

Best, Constantin

abrozzi commented 4 years ago

Brilliant!

And if I add an interaction term (geno x culture):

> result_names(fit)
[1] "Intercept"              "genoko"           "cultureY"              "genoko. cultureY"

can I get from proDA the p-values for the interaction term?

Bests and thank again for this package, -A

const-ae commented 4 years ago

There are two ways to test the significance of the interaction, which give equivalent results: First would be via a reduced model

test_diff(fit, reduced_model =  ~ geno + culture)

The second is via a contrast like this:

test_diff(fit, contrast = `genoko:cultureY`)

Note, that there is a bug right now in the result_names(fit) call. The function trips over the : and unfortunately turns it into a . which is misleading. I will try and fix this asap. But you can just use the two calls above, both of which are fine :)

const-ae commented 4 years ago

Okay I fixed the issue. If you now were to install the development version of proDA, result_names() would correctly return

> result_names(fit)
[1] "Intercept"         "genoko"            "cultureY"          "`genoko:cultureY`"

abrozzi commented 4 years ago

Great! It works.

Thank you Constantin!

const-ae commented 4 years ago

It works.

Great. If it is alright for you, I will close the issue. But feel free to reopen if anything else comes up :)

const-ae / proDA

model contrasts #4