cafferychen777 / MicrobiomeStat

Track, Analyze, Visualize: Unravel Your Microbiome's Temporal Pattern with MicrobiomeStat
https://www.microbiomestat.wiki/
31 stars 3 forks source link

Datatype of feature.tab not matrix after normalization #45

Open pkirti33 opened 4 months ago

pkirti33 commented 4 months ago

Describe the Bug When I use mStat_normalize_data() with the Rarefy-TSS method, the mStat_validate_data() function no longer passes because it doesn't recognize feature.tab as a matrix (Rule 5). When I don't use mStat_normalize_data(), all the tests pass.

Example

The following code fails at step 5 (Rule 5 failed: feature.tab should be a matrix.)

MicrobiomeData <- list(feature.tab = otu_table_matrix, 
                       meta.dat = metadata_df, 
                       feature.ann = taxonomy_matrix)

#normalize the data using rarefaction and total sum scaling
MicrobiomeData <- mStat_normalize_data(data.obj = MicrobiomeData, method = "Rarefy-TSS")
MicrobiomeData$data.obj.norm$feature.tab <- as.matrix(MicrobiomeData$data.obj.norm$feature.tab)

mStat_validate_data(MicrobiomeData)

However, the following code passes all validations. Furthermore, when I rarefy the data with mStat_rarefy_data(data.obj = MicrobiomeData) prior to validation, all validations pass.

MicrobiomeData <- list(feature.tab = otu_table_matrix, 
                       meta.dat = metadata_df, 
                       feature.ann = taxonomy_matrix)

#normalize the data using rarefaction and total sum scaling
#MicrobiomeData <- mStat_normalize_data(data.obj = MicrobiomeData, method = "Rarefy-TSS")
#MicrobiomeData$data.obj.norm$feature.tab <- as.matrix(MicrobiomeData$data.obj.norm$feature.tab)

mStat_validate_data(MicrobiomeData)

Environment Information:

cafferychen777 commented 4 months ago

Hi @pkirti33,

Thank you for bringing this issue to my attention. Indeed, it was a peculiar error where the feature.tab was not recognized as a matrix after applying the mStat_normalize_data() function with the Rarefy-TSS method. Although I couldn't pinpoint the exact cause of this anomaly, I've implemented a fix by adding a forceful conversion to matrix at the end of the normalization process.

I've already pushed the update to the GitHub repository. It should be available in a few hours. Please update the MicrobiomeStat package then, and let me know if the problem persists or if there's anything else I can help you with.

Best regards, Chen YANG

pkirti33 commented 4 months ago

Hello, Thank you for your prompt reply and help! I tried re-running my code, but the issue has not resolved itself. My steps are below:

Detach and re-install MicrobiomeStat

detach("package:MicrobiomeStat", unload = TRUE)
devtools::install_github("cafferychen777/MicrobiomeStat")
library(MicrobiomeStat)

Make the microbiomeData object:

MicrobiomeData <- list(feature.tab = otu_table_matrix, 
                       meta.dat = metadata_df, 
                       feature.ann = taxonomy_matrix)
MicrobiomeData <- mStat_normalize_data(data.obj = MicrobiomeData, method = "Rarefy-TSS")
MicrobiomeData$data.obj.norm$feature.tab <- as.matrix(MicrobiomeData$data.obj.norm$feature.tab)
mStat_validate_data(MicrobiomeData)

The error is as follows: Rule 1 passed: data.obj is a list. Rule 2 passed: meta.dat has been converted to a data.frame. Rule 3 passed: The row names of feature.tab match the row names of feature.ann. Rule 4 passed: The order of rows in meta.dat has been adjusted to match feature.tab. Error in mStat_validate_data(MicrobiomeData) : Rule 5 failed: feature.tab should be a matrix.

cafferychen777 commented 4 months ago

Hi pkirti33,

Thanks for following up and providing more details. I apologize that the issue is still not resolved. Based on the error message, it seems the root cause is that the feature.tab object is not being recognized as a matrix after the mStat_normalize_data() step, even when converting it explicitly using as.matrix().

One potential workaround is to skip the explicit normalization step. In the current version of MicrobiomeStat, almost all the functions perform "Rarefy-TSS" normalization by default under the hood. So you may be able to get the expected results without needing to call mStat_normalize_data() directly.

Try this simplified workflow and see if it resolves the validation error:

MicrobiomeData <- list(feature.tab = otu_table_matrix, 
                       meta.dat = metadata_df, 
                       feature.ann = taxonomy_matrix)

mStat_validate_data(MicrobiomeData)

If the issue persists, please let me know. I'll do some further testing on my end to identify the underlying problem with mStat_normalize_data() converting the data type. In the meantime, hopefully skipping that step provides a temporary solution.

Best regards, Caffery

pkirti33 commented 4 months ago

Thank you for your help! I'll use your recommended solution for now.

ctmlab4 commented 3 months ago

Hi all, I am new in MicrobiomeStat. I am having the same problem as @pkirti33.

"Error in mStat_validate_data(MicrobiomeData_rare) : Rule 5 failed: feature.tab should be a matrix"

Is there any update or some alternative for Rarefy-TSS?

Thank you so much! Carla.

cafferychen777 commented 3 months ago

Hi @ctmlab4,

Thanks for reaching out regarding the issue you encountered with the mStat_validate_data() function after using mStat_normalize_data() with the "Rarefy-TSS" method.

As a workaround for now, you have two options:

  1. You can directly run other functions without any additional conversions.

  2. Alternatively, after running the mStat_normalize_data() function, you can convert the feature.tab element of the returned object to a matrix using as.matrix(). Here's an example:

MicrobiomeData_rare <- mStat_normalize_data(data.obj = MicrobiomeData, method = "Rarefy-TSS")
MicrobiomeData_rare$feature.tab <- as.matrix(MicrobiomeData_rare$feature.tab)
mStat_validate_data(MicrobiomeData_rare)

Either of these approaches should resolve the issue and allow the mStat_validate_data() function to pass all the validation rules.

We appreciate your patience and understanding. We are actively working on a more permanent solution to address this issue in a future update of the MicrobiomeStat package.

If you have any further questions or concerns, please don't hesitate to reach out.

Best regards, Caffery

bark9299 commented 3 months ago

Hi @cafferychen777,

I believe I am having a similar problem as the others above. I turned my phyloseq object to a data.obj: data.obj <- mStat_convert_phyloseq_to_data_obj(physeq_final_100k)

Then I wanted to use the 'mStat_rarefy_data' command to a read depth of 100,000: rarefied_data<- mStat_rarefy_data(data.obj = data.obj, depth = 100000)

Then made my rarefied_data object a matrix which passed all the rules with 'mStat_validate_data(rarefied_data)': rarefied_data$feature.tab <- as.matrix(rarefied_data$feature.tab) mStat_validate_data(rarefied_data)

Then I wanted to use 'mStat_calculate_alpha_diversity': alpha_rarefied <- mStat_calculate_alpha_diversity(x = rarefied_data, alpha.name = c("shannon", "simpson", "observed_species")) But I get the following error: "Error in colSums(x) : 'x' must be an array of at least two dimensions"

So then I try: alpha_rarefied <- mStat_calculate_alpha_diversity(x = rarefied_data$feature.tab, alpha.name = c("shannon", "simpson", "observed_species")) which looks like it runs properly, but when i run: mStat_validate_data(alpha_rarefied) it throws an error: "Rule 1 passed: data.obj is a list. Rule 2 passed: meta.dat has been converted to a data.frame. Rule 3 passed: The row names of feature.tab match the row names of feature.ann. Rule 4 passed: The order of rows in meta.dat has been adjusted to match feature.tab. Error in mStat_validate_data(alpha_rarefied) : Rule 5 failed: feature.tab should be a matrix."

I also see this problem being addressed in #7, however reading that issue did not help me understand my issue.

When I try another normalization method like "TSS": TSS_data <- mStat_normalize_data(data.obj = data.obj, method = "TSS") And I try to make it a matrix: **note: to access the "feature.tab" i have to first go through "$data.obj.norm" then "$feature.tab" TSS_data$data.obj.norm$feature.tab <- as.matrix(TSS_data$data.obj.norm$feature.tab) mStat_validate_data(TSS_data) 'mStat_validate_data(TSS_data)' throws an error:

"Rule 1 passed: data.obj is a list. Rule 2 passed: meta.dat has been converted to a data.frame. Rule 3 passed: The row names of feature.tab match the row names of feature.ann. Rule 4 passed: The order of rows in meta.dat has been adjusted to match feature.tab. Error in mStat_validate_data(TSS_data) : Rule 5 failed: feature.tab should be a matrix."

How do I tweak my code to be able to use different normalization methods with mStat_calculate_alpha_diversity? Should I use one of the other alpha diversity commands? Thank you for your help.

MicrobiomeStat version 1.2.0 R version 4.3.2

cafferychen777 commented 3 months ago

Hi @bark9299 @pkirti33 @ctmlab4 ,

I think I may have found the cause of the error. After normalizing the data using mStat_normalize_data(), you should use the $data.obj.norm element of the returned object instead of the original data.obj. For example:

norm.data.obj <- mStat_normalize_data(data.obj, "TSS")$data.obj.norm

Then, in subsequent function calls, use norm.data.obj instead of data.obj.

The reason for this is that during the normalization process, a new data.obj.norm (in the form of a list) is generated and stored within the original data.obj. Therefore, you need to replace the usage of the original data.obj with the newly generated data.obj.norm, rather than only using the new feature.tab.

So your workflow should look something like this:

data.obj <- mStat_convert_phyloseq_to_data_obj(physeq_final_100k)
norm.data.obj <- mStat_normalize_data(data.obj, "TSS")$data.obj.norm
mStat_validate_data(norm.data.obj)
alpha_diversity <- mStat_calculate_alpha_diversity(x = norm.data.obj$feature.tab, alpha.name = c("shannon", "simpson", "observed_species"))

By using norm.data.obj consistently after the normalization step, the mStat_validate_data() function should pass all validation rules, and the mStat_calculate_alpha_diversity() function should work as expected.

Please give this a try and let me know if it resolves the issues you were encountering. If you have any further questions or need additional assistance, don't hesitate to ask.

Best regards, Caffery

ctmlab4 commented 3 months ago

Hi @bark9299 @pkirti33 @ctmlab4 ,

I think I may have found the cause of the error. After normalizing the data using mStat_normalize_data(), you should use the $data.obj.norm element of the returned object instead of the original data.obj. For example:

norm.data.obj <- mStat_normalize_data(data.obj, "TSS")$data.obj.norm

Then, in subsequent function calls, use norm.data.obj instead of data.obj.

The reason for this is that during the normalization process, a new data.obj.norm (in the form of a list) is generated and stored within the original data.obj. Therefore, you need to replace the usage of the original data.obj with the newly generated data.obj.norm, rather than only using the new feature.tab.

So your workflow should look something like this:

data.obj <- mStat_convert_phyloseq_to_data_obj(physeq_final_100k)
norm.data.obj <- mStat_normalize_data(data.obj, "TSS")$data.obj.norm
mStat_validate_data(norm.data.obj)
alpha_diversity <- mStat_calculate_alpha_diversity(x = norm.data.obj$feature.tab, alpha.name = c("shannon", "simpson", "observed_species"))

By using norm.data.obj consistently after the normalization step, the mStat_validate_data() function should pass all validation rules, and the mStat_calculate_alpha_diversity() function should work as expected.

Please give this a try and let me know if it resolves the issues you were encountering. If you have any further questions or need additional assistance, don't hesitate to ask.

Best regards, Caffery

Hi Caffery,

I tried it and I could do it without any problems! Thank you very much for your help!

Kind regards, Carla.

bark9299 commented 3 months ago

Hi @bark9299 @pkirti33 @ctmlab4 ,

I think I may have found the cause of the error. After normalizing the data using mStat_normalize_data(), you should use the $data.obj.norm element of the returned object instead of the original data.obj. For example:

norm.data.obj <- mStat_normalize_data(data.obj, "TSS")$data.obj.norm

Then, in subsequent function calls, use norm.data.obj instead of data.obj.

The reason for this is that during the normalization process, a new data.obj.norm (in the form of a list) is generated and stored within the original data.obj. Therefore, you need to replace the usage of the original data.obj with the newly generated data.obj.norm, rather than only using the new feature.tab.

So your workflow should look something like this:

data.obj <- mStat_convert_phyloseq_to_data_obj(physeq_final_100k)
norm.data.obj <- mStat_normalize_data(data.obj, "TSS")$data.obj.norm
mStat_validate_data(norm.data.obj)
alpha_diversity <- mStat_calculate_alpha_diversity(x = norm.data.obj$feature.tab, alpha.name = c("shannon", "simpson", "observed_species"))

By using norm.data.obj consistently after the normalization step, the mStat_validate_data() function should pass all validation rules, and the mStat_calculate_alpha_diversity() function should work as expected.

Please give this a try and let me know if it resolves the issues you were encountering. If you have any further questions or need additional assistance, don't hesitate to ask.

Best regards, Caffery

Hi @cafferychen777 ,

That worked for me as well. Thank you for your help and speedy reply!

Best,

E