Number of trials - Githubissues

saurabhr commented 11 months ago

I was working through the tutorial from the paper that introduces the package. Looking at the data used I noticed all the participants had same number of trials (64). Will the different number of trials in total effect the reliability of the MPT or it will work fine even when the total numbers of trials is different for each subject?

danheck commented 11 months ago

Often, MPTs are used for experimental data where the number of trials is identical for each person by design.

MPT models can also be applied if the number of trials differs per person. The individual estimates for the parameters will be more reliable (having smaller standard error/posterior standard deviation) for those individuals for whom more observations are available.

The only requirement is that we have at least one observation for each person in each of the "trees" (e.g., for each within-subject condition). Otherwise, the model might not be identifiable, which means that it is impossible to uniquely estimate certain parameters at the person-level.

saurabhr commented 11 months ago

What is a good strategy when a participant has a 0 for some response in the data? Should I add 0.5 (i.e., a small number) to that participant's counts or to all the counts from all participants in the count table?

For eg in a toy dataset (they do not have same number of trials due to exclusion of trials based on some criteria): Participant 1 counts = [1,2,3]; Participant 2 counts = [0,2,1]; one way is:
participant 1 counts remains [1,2,3] and participant 2 counts become [0.5,2.5,1.5]; the second way is: participant 1 counts become [1.5,2.5,3.5], and Participant 2 counts become [0.5,2.5,1.5]

Thank you in advance for your help!

danheck commented 11 months ago

I think adding arbitrary numbers is problematic. Even if these numbers are small, they might bias the results. For instance, if you have N=1000 participants and add 0,5 for each person, this would count as 500 "illusory observations".

Note that zero cells are not a problem at all as long as there is at least one observation in each tree. Usually, each "tree" resembles a within-subject condition for which the experimenter has full control about how many trials a person completes.

For instance, if each individuals goes through two within-subject conditions A and B, with the possible response categories (A1,A2,A3, B1,B2,B3). Then it is okay to include participants with a data vector such as (0,1,0, 4,0,1), because you have at least 1 observation in conditions A and B each. However, it is problematic if you observe (0,0,0, 34,21,28) because there is not a single observation in condition A.

In MPT modeling, such cases usually do not occur at all due to study design. If they occur, this often applies only to a few individuals which can be excluded. If the issue is relevant to many participants, I would consider whether to change the MPT model, because trees that are regularly missing might be important for identifying the parameters.

saurabhr commented 11 months ago

Thanks for the reply, this was really helpful.

danheck / TreeBUGS

Number of trials #14