cafferychen777 / ggpicrust2

Make Picrust2 Output Analysis and Visualization Easier
https://cafferychen777.github.io/ggpicrust2/
MIT License
109 stars 13 forks source link

Question: how to annotate pathway information using the output file from PICRUSt2? #27

Closed LiaOb21 closed 1 year ago

LiaOb21 commented 1 year ago

Dear developers,

Thank you for this tool!

I was wondering if there is an example file for using the patway_annotation function in the following case:

# Use case 1: Annotating pathway information using the output file from PICRUSt2
result1 <- pathway_annotation(file = "path/to/picrust2/export/file.txt",
pathway = "KO",
daa_results_df = NULL,
ko_to_kegg = FALSE)

I tried using the default output from PICRUST2, containing the pathways descriptions (i.e. path_abun_unstrat_descrip.tsv) as follows but it didnt work:

pathways <-
read_delim(
    "path_abun_unstrat_descrip.tsv",
    delim = "\t",
    escape_double = TRUE,
    trim_ws = TRUE
  )

daa_annotated_sub_method_results_df <- pathway_annotation(file = pathways,
pathway = "KO",
daa_results_df = NULL,
ko_to_kegg = FALSE)

Any suggestions?

Thank you in advance!

Cheers,

Lia

cafferychen777 commented 1 year ago

Dear Lia,

Thanks for reaching out!

The function pathway_annotation requires a specific format for the input file. It seems like you are using the description file from PICRUSt2 which might not be suitable.

In your PICRUSt2 output, you should have a file named path_abun_unstrat.tsv (or similar, depending on your specific run). This file contains the predicted abundance of KEGG Orthologs (KOs) in your samples and is the one you should use for the pathway_annotation function.

Here is an example of how to use it:

result1 <- pathway_annotation(file = "path/to/picrust2/path_abun_unstrat.tsv",
                              pathway = "KO",
                              daa_results_df = NULL,
                              ko_to_kegg = FALSE)

Please ensure that the path to the file is correct and the file is correctly formatted. If you are still experiencing issues, could you provide the error message you are seeing? It will help to better diagnose the problem.

I hope this helps, and let me know if you have any further questions!

Best,

Chen YANG

LiaOb21 commented 1 year ago

Hi Chen,

Thank you so much for your support, this solved my first issue! Now, I am having an error plotting the error_bar, and I'm pretty sure there is something that I'm not understanding on how to follow the pipeline. Here what I'm doing:

metadata <-
read_delim(
    "metadata2col.txt",
    delim = "\t",
    escape_double = FALSE,
    trim_ws = TRUE
  )

kegg_abundance <-
  ko2kegg_abundance(
    "pred_metagenome_unstrat.tsv"
  )

daa_results_df <-
  pathway_daa(
    abundance = kegg_abundance,
    metadata = metadata,
    group = "Compost",
    daa_method = "ALDEx2",
    select = NULL,
    reference = NULL
  )

daa_sub_method_results_df <-
  daa_results_df[daa_results_df$method == "ALDEx2_Kruskal-Wallace test", ]

daa_annotated_sub_method_results_df <- pathway_annotation(file = "path_abun_unstrat.tsv",
pathway = "KO",
daa_results_df = NULL,
ko_to_kegg = FALSE)

daa_results_list <-
  pathway_errorbar(
    abundance = kegg_abundance,
    daa_results_df = daa_annotated_sub_method_results_df,
    Group = "Compost",
    p_values_threshold = 0.05,
    order = "pathway_class",
    select = "Top 30",
    ko_to_kegg = FALSE,
    p_value_bar = TRUE,
    colors = NULL,
    x_lab = NULL
  )

And this is the error I get:

There are more than one method in daa_results_df$method, please filter it.
Error in `daa_results_df[daa_results_df$p_adjust < p_values_threshold, ]`:
! Can't subset rows with `daa_results_df$p_adjust < p_values_threshold`.
✖ Logical subscript `daa_results_df$p_adjust < p_values_threshold` must be size 1 or 415, not 0.
Run `rlang::last_trace()` to see where the error occurred.
Warning messages:
1: Unknown or uninitialised column: `pathway_name`.
2: Unknown or uninitialised column: `method`.
3: Unknown or uninitialised column: `p_adjust`.

I suppose I'm doing something wrong at this step:

daa_sub_method_results_df <-
  daa_results_df[daa_results_df$method == "ALDEx2_Kruskal-Wallace test", ]

Any idea on how I can solve this?

Also I'm not sure if I at this step I should keep daa_results_df = NULL or daa_results_df = daa_results_df:

daa_annotated_sub_method_results_df <- pathway_annotation(file = "path_abun_unstrat.tsv",
pathway = "KO",
daa_results_df = NULL,
ko_to_kegg = FALSE)

Thank you so much in advance! :)

Lia

cafferychen777 commented 1 year ago

Dear Lia,

I'm glad to hear that your first issue has been resolved! Regarding your current problem, it seems like there's a slight misunderstanding with the pipeline. Here's a workflow that may help you:

metadata <-
  read_delim(
    "metadata2col.txt",
    delim = "\t",
    escape_double = FALSE,
    trim_ws = TRUE
  )

kegg_abundance <-
  ko2kegg_abundance(
    "pred_metagenome_unstrat.tsv"
  )

daa_results_df <-
  pathway_daa(
    abundance = kegg_abundance,
    metadata = metadata,
    group = "Compost",
    daa_method = "ALDEx2",
    select = NULL,
    reference = NULL
  )

daa_sub_method_results_df <-
  daa_results_df[daa_results_df$method == "ALDEx2_Kruskal-Wallace test", ]

daa_annotated_sub_method_results_df <- pathway_annotation(file = NULL,
                                                          pathway = "KO",
                                                          daa_results_df = daa_sub_method_results_df,
                                                          ko_to_kegg = TRUE)

low_p_feature <- daa_annotated_sub_method_results_df[order(daa_annotated_sub_method_results_df$p_adjust), ]$feature[1:29]

daa_results_list <-
  pathway_errorbar(
    abundance = kegg_abundance,
    daa_results_df = daa_annotated_sub_method_results_df,
    Group = metadata$Compost,
    p_values_threshold = 0.05,
    order = "pathway_class",
    select = low_p_feature,
    ko_to_kegg = TRUE,
    p_value_bar = TRUE,
    colors = NULL,
    x_lab = NULL
  )

The main changes here include:

Try out this workflow and see if it helps with your problem. If you still encounter any issues, don't hesitate to let me know!

Best, Chen YANG

LiaOb21 commented 1 year ago

Thank you so much! I was actually trying to avoid ko_to_kegg = TRUE at this step:

daa_annotated_sub_method_results_df <- pathway_annotation(file = NULL,
                                                          pathway = "KO",
                                                          daa_results_df = daa_sub_method_results_df,
                                                          ko_to_kegg = TRUE)

this is because when I run the code in that way I get the following error:

We are connecting to the KEGG database to get the latest results, please wait patiently.
The pathways with statistically significance are too many. The database cannot execute the query request in a single time. Please seperate it.

For that reason I was trying to use the pathways output from PICRUSt2. But I don't know if I got exactly where is the problem. Thank you again!

cafferychen777 commented 1 year ago

You can use this.

metadata <-
  read_delim(
    "metadata2col.txt",
    delim = "\t",
    escape_double = FALSE,
    trim_ws = TRUE
  )

kegg_abundance <-
  ko2kegg_abundance(
    "pred_metagenome_unstrat.tsv"
  )

daa_results_df <-
  pathway_daa(
    abundance = kegg_abundance,
    metadata = metadata,
    group = "Compost",
    daa_method = "ALDEx2",
    select = NULL,
    reference = NULL
  )

daa_sub_method_results_df <-
  daa_results_df[daa_results_df$method == "ALDEx2_Kruskal-Wallace test", ]

daa_annotated_sub_method_results_df <- pathway_annotation(file = NULL,
                                                          pathway = "KO",
                                                          daa_results_df = daa_sub_method_results_df[order(daa_sub_method_results_df$p_adjust),][1:29,],
                                                          ko_to_kegg = TRUE)

daa_results_list <-
  pathway_errorbar(
    abundance = kegg_abundance,
    daa_results_df = daa_annotated_sub_method_results_df,
    Group = metadata$Compost,
    p_values_threshold = 0.05,
    order = "pathway_class",
    select = NULL,
    ko_to_kegg = TRUE,
    p_value_bar = TRUE,
    colors = NULL,
    x_lab = NULL
  )
LiaOb21 commented 1 year ago

Thank you so much! This code worked (even if I needed to reinstall ggpicrust2, I think I probably didn't have the last version).

By the way, this is the output, and I'm afraid there are still some issues:

image

I got 41 warnings like this:

41: In max(x) : no non-missing arguments to max; returning -Inf

I'm sorry for bothering you, I really appreciate your help!

cafferychen777 commented 1 year ago

Hello @LiaOb21 ,

Would you like to share your data? Cause it will be more convenient to debug.

Best regards,

LiaOb21 commented 1 year ago

Hi Chen,

Thank you, these are my data.

[Uploading metadata2col.txt…]() pred_metagenome_unstrat.tsv.gz

Best regards,

Lia

cafferychen777 commented 1 year ago

Metadata is also needed.😂

On Sat, 13 May 2023 at 21:21, LiaOb21 @.***> wrote:

Hi Chen,

Thank you, these are my data.

Uploading metadata2col.txt… pred_metagenome_unstrat.tsv.gz https://github.com/cafferychen777/ggpicrust2/files/11469757/pred_metagenome_unstrat.tsv.gz

Best regards,

Lia

— Reply to this email directly, view it on GitHub https://github.com/cafferychen777/ggpicrust2/issues/27#issuecomment-1546651757, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATZEQTTDFHWIMIYZXZC3V7TXF6DETANCNFSM6AAAAAAX7X2PAM . You are receiving this because you commented.Message ID: @.***>

LiaOb21 commented 1 year ago

Sorry!! I didn't realise it didn't upload! :) metadata2col.txt

cafferychen777 commented 1 year ago
截屏2023-05-13 21 42 41 截屏2023-05-13 21 42 59 截屏2023-05-13 21 49 21

I think the following codes could solve the problem.

library(readr)
library(ggpicrust2)
library(tidyverse)
library(ggprism)
library(patchwork)
metadata <-
  read_delim(
    "/Users/apple/Microbiome/ggpicrust2总/ggpicrust2测试/ggpicrust2_test/LiaOb21/metadata2col.txt",
    delim = "\t",
    escape_double = FALSE,
    trim_ws = TRUE
  )

kegg_abundance <-
  ko2kegg_abundance(
    "/Users/apple/Microbiome/ggpicrust2总/ggpicrust2测试/ggpicrust2_test/LiaOb21/pred_metagenome_unstrat.tsv"
  )

daa_results_df <-
  pathway_daa(
    abundance = kegg_abundance,
    metadata = metadata,
    group = "Compost",
    daa_method = "ALDEx2",
    select = NULL,
    reference = NULL
  )

daa_sub_method_results_df <-
  daa_results_df[daa_results_df$method == "ALDEx2_Kruskal-Wallace test", ]

daa_annotated_sub_method_results_df <- pathway_annotation(file = NULL,
                                                          pathway = "KO",
                                                          daa_results_df = daa_sub_method_results_df[order(daa_sub_method_results_df$p_adjust),][1:29,],
                                                          ko_to_kegg = TRUE)

daa_annotated_sub_method_results_df <- daa_annotated_sub_method_results_df[!is.na(daa_annotated_sub_method_results_df$pathway_name),]

daa_results_list <-
  pathway_errorbar(
    abundance = kegg_abundance,
    daa_results_df = daa_annotated_sub_method_results_df,
    Group = metadata$Compost,
    p_values_threshold = 0.05,
    order = "pathway_class",
    select = NULL,
    ko_to_kegg = TRUE,
    p_value_bar = TRUE,
    colors = NULL,
    x_lab = NULL
  )

metadata$Compost <- factor(metadata$Compost)
ggpicrust2::pathway_pca(kegg_abundance,metadata,"Compost")

metadata$sample_name <- metadata$Sample_ID

sub_kegg_abundance <- kegg_abundance[rownames(kegg_abundance) %in% daa_annotated_sub_method_results_df$feature,]
metadata <-
  read_delim(
    "/Users/apple/Microbiome/ggpicrust2总/ggpicrust2测试/ggpicrust2_test/LiaOb21/metadata2col.txt",
    delim = "\t",
    escape_double = FALSE,
    trim_ws = TRUE
  )
metadata$sample_name <- metadata$Sample_ID
ggpicrust2::pathway_heatmap(abundance = sub_kegg_abundance,metadata = metadata,group = "Compost")

Best regards,

LiaOb21 commented 1 year ago

Hi Chen,

Thank you so so much for you support and kindness! These codes work! :) And thank you for having developed this tool!

Have a nice day!

Lia

LiaOb21 commented 1 year ago

Hi Chen,

Sorry to reopen this so soon. I have a further question: I am working on soil data, but the output refers to human metabolism. Is there a way to avoid this?

Thank you so much in advance!

Cheers,

Lia

cafferychen777 commented 1 year ago
截屏2023-05-16 21 48 15

Hello Lia,

If you are working with soil data and you want to avoid the human metabolism-related output in PICRUSt2, there are a few options you can consider.

  1. Set the x_lab parameter as "feature": By setting x_lab to "feature" instead of "pathway", you will obtain feature id as x_lab.
  2. Analyze KO, EC, or MetaCyc directly: Instead of analyzing the predicted pathways, you can directly analyze the predicted functions at the level of individual databases such as KO (KEGG Orthology), EC (Enzyme Commission numbers), or MetaCyc. This approach allows you to explore the functional potential of soil microbiota without the human metabolism-related pathway annotations.
library(readr)
library(ggpicrust2)
library(tidyverse)
library(ggprism)
library(patchwork)
metadata <-
  read_delim(
    "/Users/apple/Microbiome/ggpicrust2总/ggpicrust2测试/ggpicrust2_test/LiaOb21/metadata2col.txt",
    delim = "\t",
    escape_double = FALSE,
    trim_ws = TRUE
  )

kegg_abundance <-
  ko2kegg_abundance(
    "/Users/apple/Microbiome/ggpicrust2总/ggpicrust2测试/ggpicrust2_test/LiaOb21/pred_metagenome_unstrat.tsv"
  )

daa_results_df <-
  pathway_daa(
    abundance = kegg_abundance,
    metadata = metadata,
    group = "Compost",
    daa_method = "ALDEx2",
    select = NULL,
    reference = NULL
  )

daa_sub_method_results_df <-
  daa_results_df[daa_results_df$method == "ALDEx2_Kruskal-Wallace test", ]

daa_annotated_sub_method_results_df <- pathway_annotation(file = NULL,
                                                          pathway = "KO",
                                                          daa_results_df = daa_sub_method_results_df[order(daa_sub_method_results_df$p_adjust),][1:29,],
                                                          ko_to_kegg = TRUE)

daa_annotated_sub_method_results_df <- daa_annotated_sub_method_results_df[!is.na(daa_annotated_sub_method_results_df$pathway_name),]

daa_results_list <-
  pathway_errorbar(
    abundance = kegg_abundance,
    daa_results_df = daa_annotated_sub_method_results_df,
    Group = metadata$Compost,
    p_values_threshold = 0.05,
    order = "pathway_class",
    select = NULL,
    ko_to_kegg = FALSE,
    p_value_bar = FALSE,
    colors = NULL,
    x_lab = "feature"
  )

metadata$Compost <- factor(metadata$Compost)
ggpicrust2::pathway_pca(kegg_abundance,metadata,"Compost")

metadata$sample_name <- metadata$Sample_ID

sub_kegg_abundance <- kegg_abundance[rownames(kegg_abundance) %in% daa_annotated_sub_method_results_df$feature,]
metadata <-
  read_delim(
    "/Users/apple/Microbiome/ggpicrust2总/ggpicrust2测试/ggpicrust2_test/LiaOb21/metadata2col.txt",
    delim = "\t",
    escape_double = FALSE,
    trim_ws = TRUE
  )
metadata$sample_name <- metadata$Sample_ID
ggpicrust2::pathway_heatmap(abundance = sub_kegg_abundance,metadata = metadata,group = "Compost")

ko_abundance <-
  read.delim(
    "/Users/apple/Microbiome/ggpicrust2总/ggpicrust2测试/ggpicrust2_test/LiaOb21/pred_metagenome_unstrat.tsv"
  )
#sometimes there are function, pathway or something else. Change function. when needed
rownames(ko_abundance) <- ko_abundance$function.
ko_abundance <- ko_abundance[, -1]

group <- "Compost"

ko_daa_results_df <-
  pathway_daa(
    abundance = ko_abundance,
    metadata = metadata,
    group = group,
    daa_method = "ALDEx2",
    select = NULL,
    reference = NULL
  )

daa_sub_method_results_df <-
  daa_results_df[daa_results_df$method == "ALDEx2_Kruskal-Wallace test", ]

daa_annotated_sub_method_results_df <-
  pathway_annotation(pathway = "KO",
                     daa_results_df = daa_sub_method_results_df,
                     ko_to_kegg = FALSE)

# select parameter format in pathway_error() is c("K00001", "K00002", "K00003", "K00004")

daa_results_list <-
  pathway_errorbar(
    abundance = ko_abundance,
    daa_results_df = daa_annotated_sub_method_results_df,
    Group = metadata$Compost,
    p_values_threshold = 0.05,
    order = "group",
    select = NULL,
    ko_to_kegg = FALSE,
    p_value_bar = FALSE,
    colors = NULL,
    x_lab = "description"
  )

It's worth mentioning that while these approaches can help you avoid the human metabolism-related output, it's important to verify if there are specific soil microbiota metagenomic databases available. These specialized databases may provide a more tailored analysis of the functional potential of soil microbial communities.

I hope this helps! Let me know if you have any further questions.

Best regards,

Chen

LiaOb21 commented 1 year ago

Hi Chen,

Sorry for the late reply! You have been so helpful, thank you very much! :-)

Have a nice day,

Lia