Closed sformel closed 3 years ago
The formula var1 * var2
will give you var1 + var2 + var1:var2
will give you independent contrasts for var1
and var2
in addition to a single interaction term var1 : var2
. Below is a quick dummy example to visualize this
import pandas as pd
from patsy import dmatrix
df = pd.DataFrame({'var1' : ['top', 'bottom'] * 10, 'var2' : ['left'] * 10 + ['right'] * 10})
dmatrix('var1 * var2', df, return_type='dataframe')
Intercept var1[T.top] var2[T.right] var1[T.top]:var2[T.right]
0 1.0 1.0 0.0 0.0
1 1.0 0.0 0.0 0.0
2 1.0 1.0 0.0 0.0
3 1.0 0.0 0.0 0.0
4 1.0 1.0 0.0 0.0
5 1.0 0.0 0.0 0.0
6 1.0 1.0 0.0 0.0
7 1.0 0.0 0.0 0.0
8 1.0 1.0 0.0 0.0
9 1.0 0.0 0.0 0.0
10 1.0 1.0 1.0 1.0
11 1.0 0.0 1.0 0.0
12 1.0 1.0 1.0 1.0
13 1.0 0.0 1.0 0.0
14 1.0 1.0 1.0 1.0
15 1.0 0.0 1.0 0.0
16 1.0 1.0 1.0 1.0
17 1.0 0.0 1.0 0.0
18 1.0 1.0 1.0 1.0
19 1.0 0.0 1.0 0.0
If you notice, var1[T.top]:var2[T.right]
only equals to 1 if both var1[T.top]
and var2[T.right]
are equal to one.
The regression that this encodes is given by y = b0 + var1 b1 + var2 b2 + (var1 * var2) b3
. The interaction effect should allow you to prove the differences (b3) due to the combination of var1 and var2.
I would think that this should encode for the interaction terms that you are looking for ( I think your formula is over-parameterized). Worst case scenario, you can run multiple models with different interactions. But I have a feeling that if you run var1 * var2
, you should be able to derive all of the pairwise interactions from that single interaction term (it may take some thought though).
Thanks @mortonjt, that's a very clear explanation. The matrix really helps me visualize it, and I agree that is what I was looking for, I just didn't recognize it as such. You're absolutely right that all the pairwise comparisons can be made through the three columns. I'll see if I can come up with a quick way to sort them out.
I'm new to multinomial regression and using patsy. Although I've been reading through all the excellent documentation I'm having a hard time wrapping my head around the correct way to code a typical 2-way ANOVA type model. I was able to get songbird to give me results that look like what I want, but I wanted to verify I wasn't doing anything foolish in regard to the assumptions of songbird or patsy. The main reason I'm suspicious that I shouldn't be doing it is because I can't seem to find a clear example of it on the internet despite it being such a common experimental setup.
Imagine I have two categorical factors with two levels (in the classical stats sense, not in the patsy sense of factor):
I expected the formula
"var1*var2"
to return a differentials file with 5 columns following the k-1 rule mentioned in the songbird docs. After delving into the patsy docs I realized that was incorrect, though I don't fully comprehend it. I eventually arrived at this formula to get songbird to spit out the expected 5 columns:
"var1 + var2 + C(var1, Treatment('top')):var2 + C(var1, Treatment('bottom')):var2"
So I have three specific questions:
C(var1, Treatment('top'))[T.bottom]:var2[left]
mean that these are the ranked differentials between "top,left" and "bottom,right"?
Thanks so much for all the hard work you've put into creating and maintaining this tool! I sincerely appreciate any thoughts you might have.