error while running the generate_beta_trend_test_long function

yejunbin commented 5 months ago

Describe the Bug I encountered an error while running the generate_beta_trend_test_long function with my dataset. The error message is as follows:

error: number of observations (=108) <= number of random effects (=108) for term (1 + visitWeek | subjectID); the random-effects parameters and the residual variance (or scale parameter) are probably unidentifiable

Data Background

My dataset consists of two groups, each with approximately 30 samples and data collected at 3 different time points.

I used the function with the following parameters:

res = generate_beta_trend_test_long(
  data.obj = ms.obj,
  subject.var = "subjectID",
  time.var = "visitWeek",
  group.var = "group1",
  dist.name = "BC",
  adj.vars = NULL
)

Additional Information

Validation checks passed without issues. The time.var variable (visitWeek) is coded as numeric. Data components (data.obj, meta.dat, feature.tab, feature.ann) conform to base R data structures as required.

Problem Explanation

The error suggests that the number of observations (108) equals the number of random effects (108) specified for the term (1 + visitWeek | subjectID). This results in unidentifiability of random-effects parameters and residual variance.

Request

I would appreciate guidance on resolving this issue or recommendations on adjusting the model specification to ensure proper identification of random effects.

cafferychen777 commented 4 months ago

Dear @yejunbin,

Thank you for reporting this issue with the generate_beta_trend_test_long function in MicrobiomeStat. I appreciate the detailed information you've provided about the error you're encountering.

Based on the error message and the details of your dataset, it appears that the problem may be related to the way the subject ID is set in your meta.data. This is a common cause for the type of error you're experiencing, where the number of observations equals the number of random effects.

To help diagnose and resolve this issue, could you please:

Share your meta.data, or at least a sample of it that includes the relevant columns (subjectID, visitWeek, and group1)?
Run the mStat_summarize_data function on your dataset and share the results?

This information will allow me to better understand the structure of your data and potentially identify any issues with the subject ID setup.

In the meantime, you might want to double-check that:

Each subject has a unique identifier across all time points.
The visitWeek variable is correctly coded as numeric and represents the actual time points.
There are no duplicate entries for subject-timepoint combinations.

Once I have more information about your data structure, I'll be able to provide more specific guidance on resolving this issue or adjusting the model specification.

Thank you for your patience, and I look forward to helping you resolve this problem.

Best regards, Chen YANG

cafferychen777 commented 1 month ago

Dear @yejunbin,

I have carefully analyzed the error message and data background you shared, and have further investigated and tested this issue.

Upon closer examination, I've found that this is not actually a "bug", but rather an issue caused by the structure of your data. When there are fewer data points, the model encounters some identifiability issues, leading to the error you encountered about the random effects parameters and residual variance being unidentifiable.

To address this problem, I have modified the default behavior of the function. When it detects a low number of data points, it will automatically take steps to simplify the model. Specifically, it will:

Attempt to simplify the random intercept effect to a fixed effect
If the above approach is not sufficient, it will include only the linear effect of the time variable in the model, excluding the quadratic term

This model simplification helps ensure that the model parameters can still be well-identified and estimated, even with a smaller data size.

I have tested these changes on sample datasets, and the updates successfully avoid the error you encountered.

Please try out the updated function and let me know if this resolves your issue. If you have any other questions, feel free to reach out to me.

Thank you again for the feedback - it helps us continually improve the MicrobiomeStat toolkit.

Best regards, Chen Yang

cafferychen777 / MicrobiomeStat

error while running the generate_beta_trend_test_long function #58