cafferychen777 / MicrobiomeStat

Track, Analyze, Visualize: Unravel Your Microbiome's Temporal Pattern with MicrobiomeStat
https://www.microbiomestat.wiki/
31 stars 4 forks source link

generate_taxa_test_pair() issue with time.var = NULL in random effect #33

Open 16svale opened 7 months ago

16svale commented 7 months ago

Dear developer, I am encountering an issue when using the generate_taxa_test_pair() function to test for random effect.

When I use the basic linda function as below it works:

linda.obj.all <- linda(asv_table[, ind], meta[ind, ], formula = '~SampleType1 + (1|SSF)', feature.dat.type = 'count', prev.filter = 0.1, mean.abund.filter = 0.004 , is.winsor = TRUE, outlier.pct = 0.03, p.adj.method = "BH", alpha = 0.05)

However, the generate_taxa_test_pair() that should work the same but used as below gives me an error:

create test.list

test.list <- generate_taxa_test_pair( data.obj = data.obj, group.var = "sampletype1", adj.vars = NULL, subject.var = "SSF", time.var = NULL, feature.dat.type = "count", feature.level = "Genus", prev.filter = 0.1, abund.filter = 0.004)

Output: Rule 1 passed: data.obj is a list. Rule 2 passed: meta.dat is a data.frame. Rule 3 passed: The row names of feature.tab match the row names of feature.ann. Rule 4 passed: The column names of feature.tab match the row names of meta.dat. Rule 5 passed: feature.tab is a matrix. Rule 6 passed: feature.ann is a matrix. Please note: The data components should follow base R data.frame and matrix structures, not phyloseq's formal class. Validation passed. Note: Passing validation does not guarantee the absence of all data issues. Further data exploration may be needed. Your data is in raw format ('Raw'). Normalization is crucial for further analyses. Now, 'mStat_normalize_data' function is automatically applying 'TSS' transformation. Data has been successfully normalized using TSS method. 0 features are filtered!

The filtered data has 98 samples and 49 features that will be tested!

Fit linear mixed effects models ... Completed. Error in grepl(pattern = time.var, x = df_name) : invalid 'pattern' argument

The generate_taxa_test_pair() function seems designed to optionally handle longitudinal data, given it accepts a time.var parameter, which I set to NULL. This design choice suggests it should be capable of running analyses without a time component, but the implementation does not gracefully handle a NULL time.var.

I write here a piece of the core code of the generate_taxa_test_pair() function:

Case where time.var is NULL

  if (is.null(time.var)) {
    if (is.null(group.var)) {
      fixed_effects <- adj.vars_str
      if (is.null(fixed_effects)) {
        fixed_effects <- "1"  # Intercept-only model
      }
    } else {
      if (!is.null(adj.vars_str)) {
        fixed_effects <- paste(adj.vars_str, "+", group.var)
      } else {
        fixed_effects <- group.var
      }
    }
    random_effects <- paste("(1 |", subject.var, ")")

Do you have any solution for this issue?

Thank you in advance I hope I can improve to implement the use of this package. Best Regards, Valentina

16svale commented 7 months ago

Is it possible that you should modify the part of the generate_taxa_test_pair function that handles time.var to ensure it only attempts to use time.var in grepl if time.var is not NULL?

A suggestion foe the core code but not sure if it works:

Before using time.var in grepl, check it's not NULL

if (!is.null(time.var) && !is.null(df_name)) { if (grepl(pattern = time.var, x = df_name)) {

rest of the code

}

}

cafferychen777 commented 7 months ago

Dear Valentina,

Thank you for your detailed description of the issue you encountered while using the generate_taxa_test_pair() function.

Indeed, the generate_taxa_test_pair() function is designed primarily for paired samples, which inherently involves a time component, such as "pre" and "post" measurements. Therefore, the parameter time.var cannot be set to NULL as the function expects this information to appropriately handle the paired nature of the data.

While I understand your suggestion regarding modifying the function to handle NULL time.var gracefully, it contradicts the fundamental design principle of the function, which requires a time component for paired sample analysis.

Thank you for your understanding, and I hope this clarifies the situation. If you have any further questions or need assistance with alternative analyses, please don't hesitate to ask.

Best Regards,

Chen YANG

16svale commented 7 months ago

Dear @cafferychen777 , I understand your answer. I am wondering if there is another function within the package that you developed that can be used for this purpose. I mean to get the same results as if I use the linda function and take into consideration the random effect of only one variable:

linda.obj.all <- linda(asv_table[, ind], meta[ind, ], formula = '~SampleType1 + (1|SSF)', feature.dat.type = 'count', prev.filter = 0.1, mean.abund.filter = 0.004 , is.winsor = TRUE, outlier.pct = 0.03, p.adj.method = "BH", alpha = 0.05)

or as the LinDa documentation example report: linda.obj <- linda(otu.tab, meta, formula = '~Smoke + (1|SubjectID)', alpha = 0.1, prev.cut = 0.1, lib.cut = 1000, winsor.quan = 0.97)

I believe it must be possible to implement a random effect variable even if is not a timepoint.

Best, Valentina

cafferychen777 commented 7 months ago

Dear Valentina,

Thank you for your follow-up question.

I understand your desire to replicate the functionality of the linda function while considering a random effect that is not explicitly a timepoint. However, as the linda function is maintained by other members of our team, any modifications to its functionality would require coordination and discussion among team members.

I apologize for any inconvenience this may cause. I will communicate your request to the relevant team members and discuss the possibility of implementing such functionality in future updates of the package.

Thank you for your understanding and patience in this matter. If you have any further questions or concerns, please feel free to reach out.

Best Regards,

Chen YANG

16svale commented 7 months ago

Dear Chen Yang,

I understand. However, the linda function already allows the inclusion of a random effect variable that is not explicitly a timepoint (as shown in the documentation of linda() function). Is it correct?

Best Regards.