cafferychen777 / MicrobiomeStat

Track, Analyze, Visualize: Unravel Your Microbiome's Temporal Pattern with MicrobiomeStat
https://www.microbiomestat.wiki/
30 stars 3 forks source link

Format time.var in generate_alpha_trend_test_long() function #35

Open QGaelle opened 4 months ago

QGaelle commented 4 months ago

Dear Chen Yang,

I have a question regarding the use of the generate_alpha_trend_test_long() function. I am working on the microbiome communities of three coral species. Individual colonies of each species were monitored over a three year period. I would like to test for differences in alpha diversity across timepoints.

Here is the structure of my data object: Capture d’écran 2024-02-29 à 11 05 29

I have precalculated alpha diversity indices and passed them to the alpha.obj parameter: Capture d’écran 2024-02-29 à 11 12 09

I would like to use my column “Date_format” as time.var since it includes information on both year and month but I am not sure which format I should use. I tried to make it numeric using:

data.obj$meta.dat$Date_format<-as.numeric(data.obj$meta.dat$Date_format) Warning message: NAs introduced by coercion

All rows were replaced by NAs

I did the same using my column “m_format” just as a test. And got the following output: Capture d’écran 2024-02-29 à 11 04 05

My questions is thus: How should I write my year + month factor so it can be used as time.var in the generate_alpha_trend_test_long() function?

Thanks for your help! Gaëlle

cafferychen777 commented 4 months ago

Dear Gaëlle,

Thank you for reaching out with your question. To use your "Date_format" column as the time.var in the generate_alpha_trend_test_long() function, you should convert it to a factor instead of a numeric value. You can do this with the following code:

data.obj$meta.dat$Date_format <- as.factor(data.obj$meta.dat$Date_format)

After converting it to a factor, you should be able to use it as the time.var in the function without any issues. Please let me know if you have any further questions or if there is anything else I can assist you with.

Best regards, Chen YANG

QGaelle commented 4 months ago

Dear Chen Yang,

Thanks for your prompt reply and help. I converted my Date_format column to a factor following your advice and it worked. Thank you. May I ask another question as a follow up?

Here is the output I got:

Capture d’écran 2024-03-01 à 17 19 02

The third species, M_exa does not appear in the output table and I am not sure why as it is definitely present in the data.obj:

Capture d’écran 2024-03-01 à 17 19 11

I converted Sp_field as a factor see if it helped and here is the output I got:

Capture d’écran 2024-03-01 à 17 26 45

I did the same with the Colony_ID column:

Capture d’écran 2024-03-01 à 17 26 57

The third species M_exa never appears in the results. That would be my first question: -Any idea why would be the problem?

My second question would be: -The reason why I am running the generate_alpha_trend_test_long() function in order to test for differences in alpha diversity across timepoints between colonies of three different species, colonies that were tagged and monitored at each time point when possible. The scientific question I would like to answer for each species separately is whether alpha diversity varied in time during the three year survey. If alpha measures is my variable and Date_format the factor, should Colony_ID and Species also be a factor? It is the first time I have to analyse time series data with repeated measures and I am trying to use your package the proper way but I am new to it and I am not sure I use all the parameters correctly. Sorry if that all sounds trivial. I did look for the answer on my own but did not find a solution. I hope this will also help others.

Many thanks in advance, Gaëlle

cafferychen777 commented 4 months ago

Dear @QGaelle,

Thank you for your follow-up questions regarding the missing level of Sp_field in the output and whether Sp_field and Colony_ID should be converted to factors.

Regarding the missing level of Sp_field (M_exa) in the output, this is because M_exa is being used as the reference level for the comparisons. In the output, M_platy and M_ten are being compared to M_exa. M_exa serves as the baseline for these comparisons, which is why it does not appear in the output table.

As for converting Sp_field and Colony_ID to factors, I generally do not convert them. However, it is strange that you obtained different results after converting them to factors. In theory, converting these variables to factors should not affect the results. If possible, could you please share the output you obtained after converting Sp_field and Colony_ID to factors? This will help me investigate the issue further.

Given your experience with using Phyloseq for microbiome data analysis, I would greatly appreciate any suggestions or feedback you may have for improving MicrobiomeStat. As a user of both packages, your insights could be valuable in helping us enhance the functionality and user experience of MicrobiomeStat. If you have any specific features, workflows, or improvements in mind that you believe would benefit the microbiome research community, please don't hesitate to share them with us. We are always looking for ways to make MicrobiomeStat more comprehensive and user-friendly.

Thank you for your interest in using MicrobiomeStat for analyzing microbiome dynamics across time series. I appreciate your patience and understanding. Please let me know if you have any further questions or if there's anything else I can assist you with.

Best regards, Chen Yang

QGaelle commented 4 months ago

Dear Chen Yang,

Thank you for your response. I understand now why M_exa does not appear in the output.

Here are the outputs after I converted Sp_field and Colony_ID to factors.

With Sp_field as factor:

Capture d’écran 2024-03-01 à 17 26 45

With Colony_ID also as factor:

Capture d’écran 2024-03-01 à 17 26 57

To be honest, I have a hard time interpreting the results in the output. But we can discuss that after we find out why the outputs are different when I convert Sp_field and Colony_ID as factors!

I will of course share with you if I have any comment or feedback in mind that would help improve MicrobiomeStat.

Thanks for your help. Gaëlle

cafferychen777 commented 4 months ago

Dear @QGaelle,

Thank you for your continued use of MicrobiomeStat and for sharing your questions.

Regarding your concern about converting Colony_ID to a factor, it's not necessary to do so for the analysis. In the examples provided with the MicrobiomeStat package, the subject.var (which is analogous to your Colony_ID) is not treated as a factor. You can refer to these examples to see how the data is structured and used in the function.

I hope this helps! If you have any more questions or need further clarification, please don't hesitate to ask.

Best regards, Chen YANG

QGaelle commented 4 months ago

Dear Chen Yang,

Following issues #35 and #36, I have run the generate_alpha_trend_test_long() function after converting my “Date_format” column to a factor and got the following output: Capture d’écran 2024-03-12 à 10 46 04

I would like to make sure that I am interpreting the results correctly.

Q1: In the output table, the rows 6, 7 and 8 “Sp_field:Date_format” for each species show whether there is a difference in alpha diversity across time points for each species and for the fourth species, which is used as a reference, I should look at p-value of the fifth row "Date:format", is this correct? If yes, then it looks like only the M_exa species shows differences in time.

Q2: Does the test keep the levels of time.var in a chronological order? How do I make sure it does?

Q3: In order to go further and know between which time points alpha diversity shows significant differences, should I use the generate_alpha_test_long function as some sort of post-hoc test?

Thanks again for your time and for this great package, Gaëlle

cafferychen777 commented 4 months ago

Dear @QGaelle,

Thank you for your questions. Here are my responses:

  1. Q1: Interpretation of "Sp_field:Date_format" in the output table
    The "Sp_field:Date_format" interaction term in the output table shows whether there is a difference in the trend of alpha diversity across time points for each species. If the p-value for this term is greater than 0.05, it suggests that the trend in alpha diversity over time is not significantly different among the species. In your case, since the p-value is greater than 0.05, it indicates that the trends in alpha diversity over time are similar across all species, and there is no statistically significant difference. Therefore, your interpretation that only the M_exa species shows differences in time is not correct. Instead, the result suggests that all species have similar trends in alpha diversity over time.

  2. Q2: Keeping levels of time.var in chronological order
    To ensure that the levels of the time.var are in chronological order, you can convert the time.var column to a factor and then specify the levels in the desired order. For example:

    MicrobiomeData$meta.dat$Timepoint <- factor(as.factor(MicrobiomeData$meta.dat$Timepoint), levels = c("T1", "T2", "T3", "T4", "T5", "T6", "T7", "T8", "T9", "T10"))

    By doing this, you can control the order of the levels, ensuring that they are in chronological order.

  3. Q3: Using generate_alpha_test_long function for post-hoc analysis
    Yes, you can use the generate_alpha_test_long function as a form of post-hoc analysis to identify between which time points alpha diversity shows significant differences. This function will allow you to perform pairwise comparisons between time points for each species and determine where the significant differences lie.

I hope these answers help clarify your questions. If you have any further questions or need additional assistance, please feel free to ask.

Best regards,
Chen YANG

QGaelle commented 4 months ago

Hi Chen Yang,

Thank you so much for this quick and clear response and for confirming that all species have similar trends in alpha diversity over time as shown by the last row of the output table.

What about the fifth row of the output table, the "Date_format" one? What does it stand for? I thought it showed whether there is difference in alpha diversity over time for M_exa only and I thought that the 6, 7 and 8 rows showed whether there is a difference in alpha diversity over time for each of the three other species individually. But looks like I am interpreting this wrong.

Best, Gaëlle

cafferychen777 commented 4 months ago

Dear Gaëlle,

Thank you for your follow-up question.

I apologize for any confusion caused. In the context of the generate_alpha_trend_test_long() function, the "Date_format" row in the output table actually does not have any specific significance. It is a residual effect from the model and should not be interpreted in the analysis of alpha diversity over time.

Please feel free to reach out if you have any more questions or need further clarification.

Best regards, Chen YANG

QGaelle commented 4 months ago

Perfect. All clear now. Sorry again for the trivial questions and than you so much for your time and help. Hope this will help future users. Best Regards, Gaëlle