cafferychen777 / ggpicrust2

Make Picrust2 Output Analysis and Visualization Easier
https://cafferychen777.github.io/ggpicrust2/
MIT License
102 stars 11 forks source link

pathway_daa(): object 'p_values_df' not found" #40

Closed carmennns2 closed 11 months ago

carmennns2 commented 1 year ago

Hello,

I am new to picrust2 and ggpicrust2.

I have obtained the output files from picrust2 and wanted to analyse using ggpicrust2, however, keep receiving an error using the ggpicrust2 function.

Using this command obtained from the tutorial:

results_file_input <- ggpicrust2(file = abundance_file, metadata = metadata, group = "Disease", pathway = "KO", daa_method = "Maaslin2", reference = "Healthy", ko_to_kegg = TRUE, p.adjust = "BH", order = "pathway_class", p_values_bar = FALSE, x_lab = "pathway_name")

I receive some sort of error in every daa_method used.

For LinDA, "Error in ggpicrust2(file = abundance_file, metadata = metadata, group = "Disease", : There are no statistically significant biomarkers" which I know is not an actual error, rather the statistical output that there are no significant biomarkers.

For Maaslin2, "Error in p.adjust(p_values_df$p_values, method = "none") : object 'p_values_df' not found". Even when I set p_adjust = "none", I still recieve this error.

For Deseq2 and metagenomeseq, "Error in if (sum(as.numeric(daa_results_df$p_adjust <= 0.05)) == 0) { : missing value where TRUE/FALSE needed".

Ironically, the only daa method which works is "limma voom", however, I am not using RNA-seq data.

Are there any suggestions you can offer?

Thank you, Carmen

cafferychen777 commented 1 year ago

Dear Carmen,

Thank you for reaching out and expressing your concerns regarding the errors you encountered while using ggpicrust2. It appears that you have come across a known issue in the previous version of ggpicrust2, which has been addressed in the latest version.

To resolve these errors, I recommend upgrading to the newest version of ggpicrust2. You can do this by following the instructions below:

  1. Install the devtools package if you haven't done so already:

    install.packages("devtools")
  2. Install ggpicrust2 from GitHub using the devtools package:

    devtools::install_github("cafferychen777/ggpicrust2")

By updating to the latest version, you should be able to overcome the issues you encountered with different daa_methods. However, please note that the errors you mentioned are typically encountered when no statistically significant biomarkers are detected in your data. This situation can happen regardless of the package version.

Therefore, if you still encounter the same errors after updating, it is possible that your data does not contain any features with statistically significant differences. In such cases, the error messages are indicating that no biomarkers meeting the specified criteria were found.

If you require further assistance, I recommend referring to the ggpicrust2 tutorial, specifically the section on the issue you encountered. You can find it here. The tutorial provides additional insights and guidance on how to handle the situation when no statistically significant biomarkers are found.

If you have any further questions or continue to experience difficulties, please don't hesitate to reach out. I'll be more than happy to assist you.

Best regards,
Chen

carmennns2 commented 1 year ago

Hi Chen,

Thank you so much for your quick response!

Even after uprading, I still receive the same errors.

In response to this comment: " However, please note that the errors you mentioned are typically encountered when no statistically significant biomarkers are detected in your data. This situation can happen regardless of the package version. Therefore, if you still encounter the same errors after updating, it is possible that your data does not contain any features with statistically significant differences. In such cases, the error messages are indicating that no biomarkers meeting the specified criteria were found."

Can I clarify that an output of "There are no statistically significant biomarkers" means there are no significant biomarkers. However, does an error output of -

Thank you so much and I wish you a great day.

All the best, Carmen


From: Caffery Yang @.> Sent: 05 July 2023 12:28 To: cafferychen777/ggpicrust2 @.> Cc: carmennns2 @.>; Author @.> Subject: Re: [cafferychen777/ggpicrust2] ggpicrust2 errors (Issue #40)

Dear Carmen,

Thank you for reaching out and expressing your concerns regarding the errors you encountered while using ggpicrust2. It appears that you have come across a known issue in the previous version of ggpicrust2, which has been addressed in the latest version.

To resolve these errors, I recommend upgrading to the newest version of ggpicrust2. You can do this by following the instructions below:

  1. Install the devtools package if you haven't done so already:

install.packages("devtools")

  1. Install ggpicrust2 from GitHub using the devtools package:

devtools::install_github("cafferychen777/ggpicrust2")

By updating to the latest version, you should be able to overcome the issues you encountered with different daa_methods. However, please note that the errors you mentioned are typically encountered when no statistically significant biomarkers are detected in your data. This situation can happen regardless of the package version.

Therefore, if you still encounter the same errors after updating, it is possible that your data does not contain any features with statistically significant differences. In such cases, the error messages are indicating that no biomarkers meeting the specified criteria were found.

If you have any further questions or continue to experience difficulties, please don't hesitate to reach out. I'll be more than happy to assist you.

Best regards, Chen

— Reply to this email directly, view it on GitHubhttps://github.com/cafferychen777/ggpicrust2/issues/40#issuecomment-1621480462, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ATPYRJ4MQFRV7HCBXNUDJ23XOU6TPANCNFSM6AAAAAAZ6VPY5E. You are receiving this because you authored the thread.Message ID: @.***>

cafferychen777 commented 1 year ago

Hi Carmen,

Thank you for your prompt response!

I apologize for the confusion. You are correct in your understanding. The error messages you mentioned:

do indicate that there were no statistically significant biomarkers detected in your data. These errors are not related to any issues with your command, but rather they occur when the package attempts to perform statistical calculations on the data and finds no significant results.

In such cases, the error messages serve as an indication that no biomarkers meeting the specified criteria (e.g., significance threshold) were found. Therefore, it is expected to encounter these errors when no significant biomarkers are detected, regardless of the package version.

If you have further questions or need assistance with any other aspect, please feel free to let me know. I'm here to help!

Wishing you a wonderful day!

Best regards,
Chen

carmennns2 commented 1 year ago

Thank you so much for your clear explanation!

All the best, Carmen


From: Caffery Yang @.> Sent: 05 July 2023 12:55 To: cafferychen777/ggpicrust2 @.> Cc: carmennns2 @.>; Author @.> Subject: Re: [cafferychen777/ggpicrust2] ggpicrust2 errors (Issue #40)

Hi Carmen,

Thank you for your prompt response!

I apologize for the confusion. You are correct in your understanding. The error messages you mentioned:

do indicate that there were no statistically significant biomarkers detected in your data. These errors are not related to any issues with your command, but rather they occur when the package attempts to perform statistical calculations on the data and finds no significant results.

In such cases, the error messages serve as an indication that no biomarkers meeting the specified criteria (e.g., significance threshold) were found. Therefore, it is expected to encounter these errors when no significant biomarkers are detected, regardless of the package version.

If you have further questions or need assistance with any other aspect, please feel free to let me know. I'm here to help!

Wishing you a wonderful day!

Best regards, Chen

— Reply to this email directly, view it on GitHubhttps://github.com/cafferychen777/ggpicrust2/issues/40#issuecomment-1621518042, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ATPYRJ6TZZTC5YGUH2CCWPDXOVBZHANCNFSM6AAAAAAZ6VPY5E. You are receiving this because you authored the thread.Message ID: @.***>

carmennns2 commented 1 year ago

Hi Chen,

Can I ask another question? I am having an issue with pathway_errorbar.

The commands I used:

daa_results_df <- pathway_daa(abundance = abundance, metadata = metadata, group = "Disease" , daa_method = "limma voom", select = NULL, p.adjust = "none", reference = "Salmonella")

feature_with_p_0.05 <- daa_results_df %>% filter(p_adjust < 0.05) #56 KO's statistically significant

pathway_errorbar(abundance = abundance, daa_results_df = daa_results_df, Group = metadata$Disease, ko_to_kegg = TRUE, p_values_threshold = 0.05, order = "pathway_class", select = NULL, p_value_bar = TRUE, colors = NULL, x_lab = NULL)

I received this error when using pathway_errorbar, "Error in pathway_errorbar(abundance = abundance, daa_results_df = daa_results_df, : The feature with statistically significance is zero, pathway_errorbar can't do the visualization."

In issue #39, you explained that this is because pathway_daa did not yield any statistically significant results. However, there are actually 56 KOs which have a p_adjust value less than 0.05 in the output obtained from daa_results_df.

Additionally, while I keep receiving an error for pathway_errorbar, pathway_heatmap and pathway_pca works just fine.

Am I misunderstanding something?

Thank you so much Chen! Carmen

cafferychen777 commented 1 year ago

Hi Carmen,

Thank you for reaching out. I'm sorry to hear that you're experiencing an issue with the pathway_errorbar function in ggpicrust2. Based on the error message you provided, it seems that the function is unable to visualize the pathway because it did not find any statistically significant features.

You mentioned that there are 56 KOs with a p_adjust value less than 0.05 in the daa_results_df output. To better understand the problem, it would be helpful if you could provide the corresponding dataset, including the abundance and metadata. Having access to the data will allow me to investigate the issue more effectively and provide you with a more accurate solution.

Additionally, you mentioned that the pathway_heatmap and pathway_pca functions are working fine. This suggests that the issue might be specific to the pathway_errorbar function. By examining the data, I hope to gain further insights into the problem and assist you accordingly.

Thank you for your cooperation. I look forward to your response.

Best regards, Chen

carmennns2 commented 1 year ago

Hi Chen,

I am always so thankful for your quick response.

Please see attached for the

abundance data : (https://github.com/cafferychen777/ggpicrust2/files/11967725/pred_metagenome_unstrat.csv)

and metadata: meta_species.csv

Thank you for your guidance and support (:

Carmen

cafferychen777 commented 1 year ago

Hi Carmen,

Thank you for reaching out and providing the abundance data and metadata. I appreciate your kind words!

I wanted to inquire about the type of abundance you are using in your code. Are you using "ko_abundance" or "kegg_abundance"? Additionally, could you please provide the "daa_results_df"?

However, I would like to caution you that for high-dimensional data like metagenomics, setting "p.adjust = "none"" can lead to a high number of false positives in the results. Even if you perform visualization, the displayed results may be unreliable. You may consider analyzing other data, such as "metacyc_abundance," as well.

Please let me know if you have any further questions or if there's anything else I can assist you with.

Best regards, Chen

carmennns2 commented 1 year ago

Hi Chen,

I am so sorry for the silly mistake. I realise that you were correct and I was likely using the incorrect input file. I have since corrected it.

I am currently trying to analyse the metacyc pathway using this abundance and metadata: meta_species.csv path_abun_unstrat.csv

However, am still getting the same error (all functions work except pathway_errorbar).

-Perform pathway DAA using LinDA method (WORKS) metacyc_daa_results_df <- pathway_daa(abundance = metacyc_abundance2 %>% column_to_rownames("pathway"), metadata = metadata, group = "Disease", daa_method = "LinDA", reference = "Salmonella") metacyc_daa_results_df_csv.csv -I prefer LinDA because it does pairwise comparison compared to Aldex.

-Annotate MetaCyc pathway results without KO to KEGG conversion (WORKS) metacyc_daa_annotated_results_df <- pathway_annotation(pathway = "MetaCyc", daa_results_df = metacyc_daa_results_df, ko_to_kegg = FALSE)

pathway_heatmap(abundance = metacyc_abundance2 %>% filter(pathway %in% feature_with_p_0.05$feature) %>% column_to_rownames("pathway"), metadata = metadata, group = "Disease")`

I sincerely apologise for the inconvenience. I really do appreciate your time!

cafferychen777 commented 1 year ago

Dear carmennns2,

Thank you for reaching out and providing the details of the issue you encountered. I apologize for any inconvenience caused. Based on the error message you shared, it seems that you are encountering a factor level duplication problem in the pathway_errorbar function.

To address this issue, please try using the following code:

library(readr)
library(ggpicrust2)
library(patchwork)
library(ggprism)

metacyc_abundance <- read.delim("/Users/apple/Microbiome/ggpicrust2总/ggpicrust2测试/ggpicrust2_test/carmennns2/path_abun_unstrat.csv") %>% column_to_rownames("pathway")

metadata <- read.delim("~/Microbiome/ggpicrust2总/ggpicrust2测试/ggpicrust2_test/carmennns2/meta_species.csv")

daa_results_df <- read.csv("~/Microbiome/ggpicrust2总/ggpicrust2测试/ggpicrust2_test/carmennns2/feature_with_p_0.05.csv")[,-1]

daa_annotated_results_df <- pathway_annotation(pathway = "MetaCyc", daa_results_df = daa_results_df, ko_to_kegg = FALSE)

# Pairwise: Healthy Versus Salmonella
sub_samples <- metadata %>% filter(Disease != "Asymptomatic") %>% select(X) %>% pull()

pathway_errorbar(abundance = metacyc_abundance[,sub_samples], daa_results_df = daa_annotated_results_df %>% filter(group1 != "Asymptomatic"), Group = metadata %>% filter(X %in% sub_samples) %>% select("Disease") %>% pull(), x_lab = "description")

daa_results_df <- pathway_daa(metacyc_abundance, metadata, "Disease")

# Healthy Versus Salmonella Versus Asymptomatic

daa_results_df <- pathway_daa(metacyc_abundance, metadata, "Disease", daa_method = "ALDEx2")

daa_annotated_results_df <- pathway_annotation(pathway = "MetaCyc", daa_results_df = daa_results_df, ko_to_kegg = FALSE)

pathway_errorbar(abundance = metacyc_abundance, daa_results_df = daa_annotated_results_df %>% filter(method == "ALDEx2_Kruskal-Wallace test") %>%
                 arrange(p_adjust) %>% slice(1:30), Group = metadata %>% select("Disease") %>% pull(), x_lab = "description", p_value_bar = FALSE)

Please note that this code provides two visualization options, and you can refer to the resulting figures for further analysis.

If you have any further questions or need additional assistance, please feel free to ask. Thank you for your understanding, and I hope this helps!

Best regards, Chen YANG

Screenshot 2023-07-08 at 01 22 23 Screenshot 2023-07-08 at 01 22 48
carmennns2 commented 1 year ago

WOW,

You are amazing. So sorry, but one last question. Is there any way I can extract the log2fold change/effect size (for the pairwise comparison - I understand the group comparison has no effect size).

Again, thank you (:

cafferychen777 commented 1 year ago

Hello @carmennns2 ,

You can use the following code similar to check the log 2fold change.

p <- ggpicrust2::pathway_errorbar(
  abundance = kegg_abundance,
  daa_results_df = daa_annotated_results_df,
  Group = metadata$sampling_point,
  p_values_threshold = 0.05,
  order = "pathway_class",
  select = NULL,
  ko_to_kegg = TRUE,
  p_value_bar = TRUE,
  colors = NULL,
  x_lab = "pathway_name"
)
p$data
Screenshot 2023-07-25 at 20 33 12
carmennns2 commented 1 year ago

Wonderful. Thank you for the tool and thank you more for your support (:

jsevereyn commented 1 year ago

Hello, the relative abundance values for each plotted category, where can I found them?

Or would I have to calculate (averaging the values of all samples for each ko) the values in the abundance = metacyc_abundance table?

carmennns2 commented 1 year ago

Hi Chen,

Sorry to bother you again. Another quick question.

Would it be possible to perform statistical analsysis on pathway_pca to determine if there is significant difference between the groups? - or, extract the distances so I can perform statisical analysis?

Thank you so much!