Formatting the arrays for ANOVA2rm in Python

ThiagoBorges81 commented 1 year ago

Hello, Todd.

First of all, thanks for the effort to put this great tool together. It is quite impressive.

Please, I am facing an issue and I believe it is regarding how I am formatting the arrays for the analysis.

I have a dataset with two levels (age group [with three different groups] and boat type [with six different boats]). My repeated measure is the velocity recorded at four different distances in a race for each subject. The data is balanced and there is no missing data as well. However, no matter how I set up my arrays, the code keeps returning the following error: "ValueError: Design must be balanced."

Here's a code example of how I have set up the data:

Y=np.array([[-0.0307716586667539 , 0.11497388405845 , -0.157441960911331 , -0.145320600378987] , [ -0.194052690852882 , -0.0712384103917412 , -0.261814662914772 , -0.164502273869382] , [ -0.0691454914188847, -0.156992791046363 , 0.214304383897029 , 0.246288812563233 ] , [ 0.105360515657827 , 0.131915448292274 , -0.105485507845978 , 0.13243309487546 ] , [ 0.130302111023807 , -0.0872371397915789, 0.202159899013619 , -0.0871601754424007] , [ 0.0832750502517533 , 0.133531392624523 , -0.19907236721877 , 0.106220424057168 ] , [ 0.162518929497775 , -0.243788326473946 , -0.21979741871097 , -0.121029791755254 ] , [ -0.171580314962542 , 0.186775907143335 , 0.124052648669979 , 0.156768482369337 ] , [ -0.0855221734381622, 0.0937613798144748 , 0.107935845435559 , 0.156004248476581 ] , [ -0.0307716586667539, 0.11497388405845 , -0.157441960911331 , -0.145320600378987 ] , [ -0.194052690852882 , -0.0712384103917412, -0.261814662914772 , -0.164502273869382 ] , [ -0.0691454914188847, -0.156992791046363 , 0.214304383897029 , 0.246288812563233 ] , [ 0.105360515657827 , 0.131915448292274 , -0.105485507845978 , 0.13243309487546 ] , [ 0.130302111023807 , -0.0872371397915789, 0.202159899013619 , -0.0871601754424007] , [ 0.0832750502517533 , 0.133531392624523 , -0.19907236721877 , 0.106220424057168 ] , [ 0.162518929497775 , -0.243788326473946 , -0.21979741871097 , -0.121029791755254 ] , [ -0.171580314962542 , 0.186775907143335 , 0.124052648669979 , 0.156768482369337 ] , [ -0.0855221734381622, 0.0937613798144748 , 0.107935845435559 , 0.156004248476581 ] , [ -0.0307716586667539, 0.11497388405845 , -0.157441960911331 , -0.145320600378987 ] , [ -0.194052690852882 , -0.0712384103917412, -0.261814662914772 , -0.164502273869382 ] , [ -0.0691454914188847, -0.156992791046363 , 0.214304383897029 , 0.246288812563233 ] , [ 0.105360515657827 , 0.131915448292274 , -0.105485507845978 , 0.13243309487546 ] , [ 0.130302111023807 , -0.0872371397915789, 0.202159899013619 , -0.0871601754424007] , [ 0.0832750502517533 , 0.133531392624523 , -0.19907236721877 , 0.106220424057168 ] , [ 0.162518929497775 , -0.243788326473946 , -0.21979741871097 , -0.121029791755254 ] , [ -0.171580314962542 , 0.186775907143335 , 0.124052648669979 , 0.156768482369337 ] , [ -0.0855221734381622, 0.0937613798144748 , 0.107935845435559 , 0.156004248476581 ] , [ -0.0307716586667539, 0.11497388405845 , -0.157441960911331 , -0.145320600378987 ] , [ -0.194052690852882 , -0.0712384103917412, -0.261814662914772 , -0.164502273869382 ] , [ -0.0691454914188847, -0.156992791046363 , 0.214304383897029 , 0.246288812563233 ] , [ 0.105360515657827 , 0.131915448292274 , -0.105485507845978 , 0.13243309487546 ] , [ 0.130302111023807 , -0.0872371397915789, 0.202159899013619 , -0.0871601754424007] , [ 0.0832750502517533 , 0.133531392624523 , -0.19907236721877 , 0.106220424057168 ] , [ 0.162518929497775 , -0.243788326473946 , -0.21979741871097 , -0.121029791755254 ] , [ -0.171580314962542 , 0.186775907143335 , 0.124052648669979 , 0.156768482369337 ] , [ -0.0855221734381622, 0.0937613798144748 , 0.107935845435559 , 0.156004248476581]])

A = np.array([0, 0, 1, 1, 2, 2, 0, 0, 1, 1, 2, 2, 0, 0, 1, 1, 2, 2, 0, 0, 1, 1, 2, 2, 0, 0, 1, 1, 2, 2, 0, 0, 1, 1, 2, 2]) B = np.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5]) SUBJ = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35])

Here, Y is log-transformed because the original data is heteroscedastic. A is the age group and B is the boat type. This is a subset of my data, so I kept 36 subjects for the example.

I have read the documentation and tried to use the examples on similar designs as a template to set up my data, but all have been unsuccessful.

Please, would you be able to provide me with some guidance on how should I set up the data for this kind of design ( Repeated measure 2-way ANOVA) to be done in Python? Thank you very much in advance.

Kind regards

Thiago

0todd0000 commented 1 year ago

Thank you for the feedback!

From your description it is clear that there are three levels of A and six levels of B. However, the following points are unclear:

How many subjects are there in total?
Did each subject perform in all six levels of B?
(Just to confirm) Do the four columns in variable Y represent the four measurement locations?

ThiagoBorges81 commented 1 year ago

Hello, Todd. Thank you very much for replying!

Please, follow the responses:

How many subjects are there in total?

There are 36 subjects (for this example). In fact, the original dataset has 30 subjects subject per age group A (with 90 subjects for each boat type B).

Did each subject perform in all six levels of B?

No. They are different individuals. For example, A represents the age group( JR/U23/SR). B represent boat types (1x/2x/4x/2-/4-/8+). So, I have JR in boat 1x, JR in boat 2x, U23 in boat 2-, SR in boat 4-, and so forth.

Just to confirm) Do the four columns in variable Y represent the four measurement locations?

Yes

Please, here's a print of what the data looks like:

key: age group: 0=jr;1=U23;2=sr boat type: 0=1x;1=2x;2=4x;3=2-;4=4-;5=8+

Again, the print is a subset of the original dataset. I found it easier to format a sample before I apply it to the original dataset.

Thank you very much.

Kind regards,

Thiago

0todd0000 commented 1 year ago

Thanks for that, almost everything is clear. One more question:

Are there any connections between the boat pairs? (If the boats are labeled G1-B1, G1-B2, G2-B1, G2-B2, G3-B1, and G3-B2 where G represents Group and B represents Boat, is there some experimental connection between -B1 and -B2 across the groups?

ThiagoBorges81 commented 1 year ago

Thanks, Todd!

Are there any connections between the boat pairs? (If the boats are labeled G1-B1, G1-B2, G2-B1, G2-B2, G3-B1, and G3-B2 where G represents Group and B represents Boat, is there some experimental connection between -B1 and -B2 across the groups?

No. There are no connections across the groups. Here, I am comparing the velocity between the different age groups and boat types.

Thx again.

Regards

Thiago

0todd0000 commented 1 year ago

If there are no similarities amongst the boats then it sounds like this may be a two-way nested ANOVA (spm1d.stats.anova2nested).

Note that the dependent variable (DV) array Y is (J,Q), where J is the number of observations and Q is the number of domain points. From the perspective of SPM, the Q values are not repeated measurements. They are instead domain nodes that are used to approximate a continuous process. In this case it sounds like the one-dimensional domain can be represented by a variable q, where q is the distance-from-start, with Q=4 nodes used to approximate the continuous process from q=0 through q=200. Equivalently, each row of Y is regarded as a single measurement --- or single discrete approximation --- of a 1D continuum.

Since there do not appear to be any other repeated measures factors, and since BOAT appears to be nested within GROUP, I suggest trying spm1d.stats.anova2nested which does not require a SUBJ input.

ThiagoBorges81 commented 1 year ago

Thank you very much, Todd. I appreciate your help.

I have run the code for two-way nested ANOVA, and I got a few results from the subset sample data. Now I am going to expand the analysis for the full dataset.

I will get back to you when I finish all the analysis.

Again, thank you very much for the guidance!

Kind regards

Thiago

0todd0000 commented 1 year ago

OK, no problem. I'll close this issue for now. Please feel free to re-open if a related problem arises.

0todd0000 / spm1d

Formatting the arrays for ANOVA2rm in Python #259