cafferychen777 / ggpicrust2

Make Picrust2 Output Analysis and Visualization Easier
https://cafferychen777.github.io/ggpicrust2/
Other
91 stars 11 forks source link

My pred_metagenome_unstrat file has EC:1.1.1.1 and not KO format #45

Closed Aristotle1992 closed 8 months ago

Aristotle1992 commented 11 months ago

For my pred_metagenome_unstrat file I have the EC:1.1.1.1 and not the Ko annotation and from the best of my knowledge, this file ought to be my abundance file. However, I had to take a long route to convert the EC annotation to KO annotation and one problem with that is that not all EC annotation had a corresponding KO annotation, so i had to drop some EC during conversion to KO. Is it possible to use ggpicrust2 to analyze EC annotated pred_metagenome_unstrat and if yes, is it possible that i use the following script:

ko_abundance_file <- "path/to/your/pred_metagenome_unstrat.tsv" kegg_abundance <- ko2kegg_abundance(ko_abundance_file) # Or use data(kegg_abundance)

metadata <- read_delim("path/to/your/metadata.txt", delim = "\t", escape_double = FALSE, trim_ws = TRUE)

The default DAA method is "ALDEx2"

Please change group to "your_group_column" if you are not using example dataset

daa_results_df <- pathway_daa(abundance = kegg_abundance, metadata = metadata, group = "Environment", daa_method = "linDA", select = NULL, p.adjust = "BH", reference = NULL)

If you have more than 3 group levels and want to use the LinDA, limma voom, or Maaslin2 methods, you should provide a reference.

metadata <- read_delim("path/to/your/metadata.txt", delim = "\t", escape_double = FALSE, trim_ws = TRUE)

Please change group to "your_group_column" if you are not using example dataset

daa_results_df <- pathway_daa(abundance = kegg_abundance, metadata = metadata, group = "Group", daa_method = "LinDA", select = NULL, p.adjust = "BH", reference = "Harvard BRI")

daa_annotated_results_df <- pathway_annotation(pathway = "KO", daa_results_df = daa_results_df, ko_to_kegg = TRUE)

Please change Group to metadata$your_group_column if you are not using example dataset

p <- pathway_errorbar(abundance = kegg_abundance, daa_results_df = daa_annotated_results_df, Group = metadata$Environment, ko_to_kegg = TRUE, p_values_threshold = 0.05, order = "pathway_class", select = NULL, p_value_bar = TRUE, colors = NULL, x_lab = "pathway_name")

Or do i use the pred_metagenome_unstrat_descrip for my Ko abundance file as it has a Column for description? Please i need clarification on this. Thanks

This image is an snapshot of the pred_metagenome_unstart.tsv file image

cafferychen777 commented 11 months ago

Hello @Aristotle1992 ,

Thank you for your question. Yes, you can use ggpicrust2 to analyze EC annotated pred_metagenome_unstrat files. However, please note that EC and KO are different, and analyzing EC abundance requires a different workflow compared to KO abundance analysis.

For analyzing EC annotated data using ggpicrust2, you can follow the workflow provided in the following tutorial: https://github.com/cafferychen777/ggpicrust2#if-an-error-occurs-with-ggpicrust2-please-use-the-following-workflow

In that tutorial, you'll find a specific workflow for MetaCyc Pathway and EC analysis. It involves different steps and functions tailored for EC abundance data. Please follow the instructions in the tutorial to perform the analysis correctly.

In summary, to analyze EC annotated pred_metagenome_unstrat files with ggpicrust2, follow the appropriate workflow mentioned in the tutorial, and make sure to use the relevant functions and parameters specifically designed for EC data analysis.

If you have any further questions or need additional clarification, feel free to ask. Happy analyzing!

Aristotle1992 commented 11 months ago

Thanks for your reply. Well appreciated. I will look into it.

Aristotle1992 commented 11 months ago

metadata <- read_delim("C:/Microbiome/picrust/Rhizo/check/rhizo_metadata.txt", delim = "\t", escape_double = FALSE, trim_ws = TRUE) EC_abundance <- read.delim("C:/Microbiome/picrust/Rhizo/pred_metagenome_unstrat_descrip.tsv")

Perform pathway DAA using LinDA method

Please change column_to_rownames() to the feature column if you are not using example dataset

Please change group to "your_group_column" if you are not using example dataset

metacyc_daa_results_df <- pathway_daa(abundance = EC_abundance %>% column_to_rownames("description"), metadata = metadata, group = "GDD", daa_method = "LinDA")

It brough the error: Error in .rowNamesDF<-(x, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': ‘Co-methyltransferase’, ‘methyltransferase’, ‘specific)’

In the pred_metagenome_unstrat_descrip, there is no pathway just description so i decided to use description in place of pathway which was used in the tutorial.

Below is the pred_metagenome_unstrat_descrip files i used and i decided to use this file since it contain the description of the EC annotation.

Please how can i resolve this error?

image

Aristotle1992 commented 11 months ago

Also i feel that the workflow using the EC annotated pred_metagenome_unstrat_descrip file dataset should be included in the tutorial as most microbiome dataset come in EC annotation. I could provide or share my EC dataset to be used as tutorial for subsequent user that might want to use ggpicrust2 for microbiome analysis. Thanks

cafferychen777 commented 11 months ago

Hello @Aristotle1992,

Thank you for providing the additional information and sharing your experience with the EC annotated pred_metagenome_unstrat_descrip file.

To resolve the "duplicate 'row.names' are not allowed" error, you can add the following line of code after importing the EC_abundance data:

EC_abundance <- EC_abundance %>% select(-c("function"))

This will remove the "function" column from the EC_abundance data, which seems to be causing the duplicate row names issue.

Regarding your second question, you are correct that the only difference between pred_metagenome_unstrat and pred_metagenome_unstrat_descrip is the presence of the "description" column in the latter. If you prefer to use pred_metagenome_unstrat_descrip for your analysis, you can simply remove the "description" column using Excel or any other data manipulation tool. After removing the "description" column, both datasets should be equivalent, and you can proceed with the analysis as shown in the tutorial.

I appreciate your suggestion to include a workflow using EC annotated pred_metagenome_unstrat_descrip in the tutorial. Including such an example would indeed be helpful for users dealing with microbiome datasets in EC annotation. If you are willing to share your EC dataset, you could consider providing it as an example dataset in the tutorial to benefit other users.

If you have any more questions or encounter any issues, feel free to ask. I'm here to assist you. Happy analyzing!

Aristotle1992 commented 11 months ago

Oh thanks for your feedback. I will try that out. The link to my EC dataset is https://drive.google.com/drive/folders/1kUt7Q5hGPSTESEAF1i_nbbo6q4XTwMKQ?usp=drive_link. Thanks

Aristotle1992 commented 11 months ago

So i tried what you asked me to do and all went well until i got to the pathway_errorbar.

metadata <- read_delim("C:/Microbiome/picrust/Rhizo/check/rhizo_metadata.txt", delim = "\t", escape_double = FALSE, trim_ws = TRUE) EC_abundance <- read.delim("C:/Microbiome/picrust/Rhizo/pred_metagenome_unstrat.tsv")

EC_abundance <- EC_abundance %>% select(-c("function."))

Perform pathway DAA using LinDA method

Please change column_to_rownames() to the feature column if you are not using example dataset

Please change group to "your_group_column" if you are not using example dataset

metacyc_daa_results_df <- pathway_daa(abundance = EC_abundance %>% column_to_rownames("function."), metadata = metadata, group = "GDD", daa_method = "DESeq2")

Annotate MetaCyc pathway results without KO to KEGG conversion

metacyc_daa_annotated_results_df <- pathway_annotation(pathway = "EC", daa_results_df = metacyc_daa_results_df, ko_to_kegg = FALSE)

Generate pathway error bar plot

Please change column_to_rownames() to the feature column

Please change Group to metadata$your_group_column if you are not using example dataset

pathway_errorbar(abundance = EC_abundance %>% column_to_rownames("function."), daa_results_df = metacyc_daa_annotated_results_df, Group = metadata$GDD, ko_to_kegg = FALSE, p_values_threshold = 0.05, order = "GDD", select = NULL, p_value_bar = TRUE, colors = NULL, x_lab = "description")

The last code brought this error:

Error in pathway_errorbar(abundance = EC_abundance %>% column_to_rownames("function."), : The feature with statistically significance are more than 30, the visualization will be terrible. Please use select to reduce the number. Now you have "EC:1.1.1.203", "EC:1.1.1.304", "EC:1.1.1.335", "EC:1.1.1.381", "EC:1.1.1.76", "EC:1.1.1.87", "EC:1.1.1.88", "EC:1.11.1.6", "EC:1.11.2.4", "EC:1.12.98.1", "EC:1.13.11.3", "EC:1.14.11.18", "EC:1.14.12.10", "EC:1.14.13.127", "EC:1.14.13.149", "EC:1.14.13.2", "EC:1.14.13.40", "EC:1.14.18.1", "EC:1.17.1.4", "EC:1.2.1.4", "EC:1.2.1.54", "EC:1.2.1.71", "EC:1.2.1.91", "EC:1.2.5.1", "EC:1.3.1.25", "EC:1.3.1.34", "EC:1.3.1.83", "EC:1.3.1.9", "EC:1.3.7.7", "EC:1.3.8.12", "EC:1.3.99.35", "EC:1.5.1.28", "EC:1.5.1.34", "EC:1.7.1.15", "EC:1.7.99.4", "EC:1.8.1.4", "EC:1.8.1.7", "EC:2.1.1.294", "EC:2.1.1.72", "EC:2.3.1.109", "EC:2.3.1.201", "EC:2.3.1.207", "EC:2.3.1.28", "EC:2.3.1.29", "EC:2.3.1.37", "EC:2.4.1.64", "EC:2.4.2.53", "EC:2.5.1.62", "EC:2.6.1.2", "EC:2.6.1.21", "EC:2.6.1.66", "EC:2.6.1.98", "EC:2.7.1.181", "EC:2.7.1.20", "EC:2.7.1

Afterwards, I tried to change the select option from NULL to 1:10 and it then brought the following error:

Error in pathway_errorbar(abundance = EC_abundance %>% column_to_rownames("function."), : The feature with statistically significance is zero, pathway_errorbar can't do the visualization.

How can i resolve this error. Thanks

Secondly is it possible i search for the equivalent of the EC annotated pred-mteganome file for thier corresponding Ko. I tried that and the pathway error bar worked but the problem with this method is that i couldnt find KO for some EC annotation. Is the a proper or right alternative to performing this analysis when you have an EC annotated pred_metagenome_unstrat_descrip file. Thanks

Aristotle1992 commented 11 months ago

I have been able to resolve the various issues I encountered. Thanks