cafferychen777 / ggpicrust2

Make Picrust2 Output Analysis and Visualization Easier
https://cafferychen777.github.io/ggpicrust2/
Other
92 stars 11 forks source link

add a parameter to select top significant pathways #13

Closed lixiaopi1985 closed 1 year ago

lixiaopi1985 commented 1 year ago

Hi Caffery,

Thank you for fixing the bug.

I have a request for the pathway_errorbar(), is it possible to add a parameter to select top significant pathways or top pathways by other criteria. Because I got a different error when using the wrapper ggpicrust2

Error in pathway_errorbar(abundance = abundance, daa_results_df = daa_sub_method_results_df,  : 
  The feature with statistically significance are more than 30, the visualization will be terrible.
 Please use select to reduce the number.

I could go step by step using each function, but doing so makes the wrapper lose its convenience and its function in preliminary data exploration, don't you think?

Also is it possible to add parameters in the wrapper to choose or turn on or off the existing visualization methods?

Best regards,

cafferychen777 commented 1 year ago

Dear Dr. Liu,

Thank you for bringing up this issue with the pathway_errorbar() function. I agree that it would be helpful to have a parameter to select top significant pathways or top pathways based on other criteria.

Regarding the error you encountered with the ggpicrust2 wrapper, I understand that using each function step by step would be less convenient and lose its function in preliminary data exploration.

As for adding parameters to choose or turn on/off existing visualization methods in the wrapper, I think it's a great idea and I will consider implementing it in the future.

However, I must inform you that due to my current workload, which includes multiple final exams, I may not have the time to make these changes immediately. I apologize for any inconvenience this may cause.

Thank you again for your valuable feedback, and I will keep you updated on any progress regarding these changes.

Best regards, Caffery

Xiaoping Li @.***>于2023年4月7日 周五00:17写道:

Hi Caffery,

Thank you for fixing the bug.

I have a request for the pathway_errorbar(), is it possible to add a parameter to select top significant pathways or top pathways by other criteria. Because I got a different error when using the wrapper ggpicrust2

Error in pathway_errorbar(abundance = abundance, daa_results_df = daa_sub_method_results_df, : The feature with statistically significance are more than 30, the visualization will be terrible. Please use select to reduce the number.

I could go step by step using each function, but doing so makes the wrapper lose its convenience and its function in preliminary data exploration, don't you think?

Also is it possible to add parameters in the wrapper to choose or turn on or off the existing visualization methods?

Best regards,

— Reply to this email directly, view it on GitHub https://github.com/cafferychen777/ggpicrust2/issues/13, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATZEQTRQMV7HHOHRFYVXX73W73UAJANCNFSM6AAAAAAWVTR5KM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ghost commented 1 year ago

Hello Caffery,

Thank you for ggpicrust2; that is a great tool!

I'm wondering if you have developed a parameter for the pathway_errorbar() to select the top significant pathways?

I have a question about selecting "ko" numbers. I run the code below:

metadata <- read_excel("picrustmetadata.xlsx")
kegg_abundance <- ko2kegg_abundance(file = "/Users/ilksen/Documents/PhD/Ilksen_Picrust_bacphyllo/KO_metagenome_out/pred_metagenome_unstrat.tsv")

group <- "genotype"
daa_results_df <-
  pathway_daa(
    abundance = kegg_abundance,
    metadata = metadata,
    group = group,
    p.adjust = "BH",
    daa_method = "ALDEx2",
    select = NULL,
    reference = NULL)

daa_sub_method_results_df <-
  daa_results_df[daa_results_df$method == "ALDEx2_Kruskal-Wallace test", ]

daa_annotated_sub_method_results_df <-
  pathway_annotation(pathway = "KO",
                     daa_results_df = daa_sub_method_results_df,
                     ko_to_kegg = TRUE)

Group <- metadata$genotype
pathway_errorbar(abundance = kegg_abundance,
                 daa_results_df = daa_annotated_sub_method_results_df,
                 Group = Group,
                  ko_to_kegg = TRUE,
                 p_values_threshold = 0.05,
                 order = "pathway_class",
                 select = NULL,
                 p_value_bar = TRUE,
                 colors = NULL,
                x_lab = NULL)

Error in pathway_errorbar(abundance = kegg_abundance, daa_results_df = daa_annotated_sub_method_results_df,  : 
  The feature with statistically significance are more than 30, the visualization will be terrible.

 Please use select to reduce the number.
 Now you have "ko00563", "ko05412", "ko03450", "ko00311", "ko00310", "ko00600", "ko04142", "ko04260", "ko00909", "ko00510", "ko05110", "ko04974", "ko00565", "ko00905", "ko05222", "ko00514", "ko05416", "ko05140", "ko00591", "ko04370", "ko00380", "ko05120", "ko04666", "ko05322", "ko00627", "ko04380", "ko00941", "ko00943", "ko01057", "ko01056", "ko05016", "ko04145", "ko00071", "ko00072", "ko05210", "ko00531", "ko04916", "ko00533", "ko00360", "ko00633", "ko04115", "ko00362", "ko00603", "ko04270", "ko00281", "ko00280", "ko00601", "ko05146", "ko05145", "ko05144", "ko00051", "ko00643", "ko00120", "ko00965", "ko05414", "ko04614", "ko05010", "ko05012", "ko05131", "ko02060", "ko03320", "ko04744", "ko00522", "ko04622", "ko00460", "ko04970", "ko04972", "ko00232", "ko00660", "ko04512", "ko05410", "ko00331", "ko04080", "ko04514", "ko00473", "ko04510", "k
> View(daa_annotated_sub_method_results_df)

You want us to select the "ko" number from here: however, when I looked at the "daa_annotated_sub_method_results_df" file to choose the most significant ones, I couldn't see a few of them among these "ko" numbers(above). Also "ko04512" and "ko05410" are here(above), but I couldn't find them inside "the "daa_annotated_sub_method_results_df" file.

What's meant by these numbers' order? Like "ko00563" is most significant one? When I look at its p-value, it is not at the top. Note: I didn't set a "reference"; I have five different plant groups inside "metadata$genotype" if it matters.

Thank you in advance! Best regards, Ilksen

cafferychen777 commented 1 year ago

Hello Ilksen,

Thank you for reaching out and I'm glad to hear that you find ggpicrust2 to be a useful tool!

Regarding your question about the pathway_errorbar() parameter, unfortunately, I have not developed a parameter for selecting the top significant pathways. However, there is a workaround that you can use to select the top pathways. You can use the "select" parameter in pathway_errorbar() to specify the pathways you want to visualize. For example, you can mimic the code I provided to select the top 20 pathways with the lowest p-values and visualize them using pathway_errorbar().

limma_daa_results_df_low_p <-
limma_daa_results_df[order(limma_daa_results_df$p_adjust),][1:20,]

limma_daa_results_df_low_p <- pathway_annotation(pathway = "KO",
daa_results_df = limma_daa_results_df_low_p, ko_to_kegg = TRUE)

# plot anatomical location daa
combine_col_ebar_plot <- pathway_errorbar(abundance = kegg_abundance,
                                   daa_results_df =
limma_daa_results_df_low_p,
                                   Group =
matching_metadata$anat_space_combine,
                                   ko_to_kegg = TRUE,
                                   p_values_threshold = 0.05,
                                   order = "pathway_class",
                                   select = NULL,
                                   p_value_bar = TRUE,
                                   colors = NULL,
                                   x_lab = "pathway_name")

Regarding your question about the "ko" numbers, it is important to note that the order of the "ko" numbers does not have any significance.

Finally, you mentioned that you were unable to find some of the "ko" numbers in the daa_annotated_sub_method_results_df file. But this doesn't match the code, you can double check your df.

I hope this helps! Let me know if you have any further questions.

Best regards,

Caffery

[image: Mailtrack] https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11& Sender notified by Mailtrack https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11& 23/04/12 下午06:17:43

ilksentpc @.***> 于2023年4月12日周三 13:13写道:

Hello Caffery,

Thank you for ggpicrust2; that is a great tool!

I'm wondering if you have developed a parameter for the pathway_errorbar() to select the top significant pathways?

I have a question about selecting "ko" numbers. I run the code below:

metadata <- read_excel("picrustmetadata.xlsx") kegg_abundance <- ko2kegg_abundance(file = "/Users/ilksen/Documents/PhD/Ilksen_Picrust_bacphyllo/KO_metagenome_out/pred_metagenome_unstrat.tsv")

group <- "genotype" daa_results_df <- pathway_daa( abundance = kegg_abundance, metadata = metadata, group = group, p.adjust = "BH", daa_method = "ALDEx2", select = NULL, reference = NULL)

daa_sub_method_results_df <- daa_results_df[daa_results_df$method == "ALDEx2_Kruskal-Wallace test", ]

daa_annotated_sub_method_results_df <- pathway_annotation(pathway = "KO", daa_results_df = daa_sub_method_results_df, ko_to_kegg = TRUE)

Group <- metadata$genotype pathway_errorbar(abundance = kegg_abundance, daa_results_df = daa_annotated_sub_method_results_df, Group = Group, ko_to_kegg = TRUE, p_values_threshold = 0.05, order = "pathway_class", select = NULL, p_value_bar = TRUE, colors = NULL, x_lab = NULL)

Error in pathway_errorbar(abundance = kegg_abundance, daa_results_df = daa_annotated_sub_method_results_df, : The feature with statistically significance are more than 30, the visualization will be terrible.

Please use select to reduce the number. Now you have "ko00563", "ko05412", "ko03450", "ko00311", "ko00310", "ko00600", "ko04142", "ko04260", "ko00909", "ko00510", "ko05110", "ko04974", "ko00565", "ko00905", "ko05222", "ko00514", "ko05416", "ko05140", "ko00591", "ko04370", "ko00380", "ko05120", "ko04666", "ko05322", "ko00627", "ko04380", "ko00941", "ko00943", "ko01057", "ko01056", "ko05016", "ko04145", "ko00071", "ko00072", "ko05210", "ko00531", "ko04916", "ko00533", "ko00360", "ko00633", "ko04115", "ko00362", "ko00603", "ko04270", "ko00281", "ko00280", "ko00601", "ko05146", "ko05145", "ko05144", "ko00051", "ko00643", "ko00120", "ko00965", "ko05414", "ko04614", "ko05010", "ko05012", "ko05131", "ko02060", "ko03320", "ko04744", "ko00522", "ko04622", "ko00460", "ko04970", "ko04972", "ko00232", "ko00660", "ko04512", "ko05410", "ko00331", "ko04080", "ko04514", "ko00473", "ko04510", "k

View(daa_annotated_sub_method_results_df)

You want us to select the "ko" number from here: however, when I looked at the "daa_annotated_sub_method_results_df" file to choose the most significant ones, I couldn't see a few of them among these "ko" numbers(above). Also "ko04512" and "ko05410" are here(above), but I couldn't find them inside "the "daa_annotated_sub_method_results_df" file.

What's meant by these numbers' order? Like "ko00563" is most significant one? When I look at its p-value, it is not at the top. Note: I didn't set a "reference"; I have five different plant groups inside "metadata$genotype" if it matters.

Thank you in advance! Best regards, Ilksen

— Reply to this email directly, view it on GitHub https://github.com/cafferychen777/ggpicrust2/issues/13#issuecomment-1504646334, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATZEQTW7H72HUDVHMDMPMIDXAY2ZHANCNFSM6AAAAAAWVTR5KM . You are receiving this because you commented.Message ID: @.***>