cafferychen777 / ggpicrust2

Make Picrust2 Output Analysis and Visualization Easier
https://cafferychen777.github.io/ggpicrust2/
Other
92 stars 11 forks source link

finding KO's that contributed to KEGG pathway, KO_to_kegg=FALSE, and Implementing LEfSe #32

Closed kriegerm closed 1 year ago

kriegerm commented 1 year ago

I have four different questions - I hope it's not too much to post them here all at once! :)

  1. I successfully used ggpicrust2 to run a DESeq2 analysis on my data (code below):

results_DESeq2_BH <- ggpicrust2(file = ko_abundance, metadata = metadata, group = "case_control_status", pathway = "KO", daa_method = "DESeq2", reference = "Control", p.adjust = "BH", ko_to_kegg = TRUE, order = "pathway_class", p_values_bar = TRUE, x_lab = "pathway_name")

This gives me an output with graphics and stuff, HOWEVER, I also get this printed out:

Performing pathway differential abundance analysis... DESeq2 is only suitable for comparison between two groups. converting counts to integer mode it appears that the last variable in the design formula, 'Group_group_nonsense', has a factor level, 'Control', which is not the reference level. we recommend to use factor(...,levels=...) or relevel() to set this as the reference level before proceeding. for more information, please see the 'Note on factor levels' in vignette('DESeq2').

Control is, in fact, a level in "cause_control_status," so I'm not quite sure what to do to fix that.

  1. I get 3 KEGG pathways that are significantly differently expressed from the above code, which is great. I would like to know which KO's contributed to that KEGG pathway - is there a way to find this out? For example, the pathway "Epithelial cell invasion" is unregulated in my treatment. How do I find out which KO's specifically contributed to this "Epithelial cell invasion" pathway?

  2. When I change KO_to_kegg to FALSE (as shown below), I get this error:

results_DESeq2_BH <- ggpicrust2(file = ko_abundance, metadata = metadata, group = "case_control_status", pathway = "KO", daa_method = "DESeq2", reference = "Control", p.adjust = "BH", ko_to_kegg = FALSE, order = "pathway_class", p_values_bar = TRUE, x_lab = "pathway_name")

Performing pathway differential abundance analysis... DESeq2 is only suitable for comparison between two groups. converting counts to integer mode it appears that the last variable in the design formula, 'Group_group_nonsense', has a factor level, 'Control', which is not the reference level. we recommend to use factor(...,levels=...) or relevel() to set this as the reference level before proceeding. for more information, please see the 'Note on factor levels' in vignette('DESeq2'). using pre-existing size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting model and testing -- replacing outliers and refitting for 1699 genes -- DESeq argument 'minReplicatesForReplace' = 7 -- original counts are preserved in counts(dds) estimating dispersions fitting model and testing Annotating pathways... Creating pathway error bar plots... Error in[.data.frame(daa_results_df, , x_lab) : undefined columns selected

I suspect I have the wrong label for the x-axis, but I don't know what the appropriate one is here.

  1. I would like to implement LEfSe to analyze this data as well, but when I put it in as a method the code won't run. Is there something I am doing wrong on input?

results_Lefse_BH <- ggpicrust2(file = ko_abundance, metadata = metadata, group = "case_control_status", pathway = "KO", daa_method = "Lefse", reference = "Control", p.adjust = "BH", ko_to_kegg = TRUE, order = "pathway_class", p_values_bar = TRUE, x_lab = "pathway_name")

Performing pathway differential abundance analysis... Error in p.adjust(p_values_df$p_values, method = "BH") : object 'p_values_df' not found

Thank you so much!

cafferychen777 commented 1 year ago

Dear Krieger @kriegerm ,

Thank you for reaching out with your questions. Here are my suggestions:

  1. For your first point, if there are no errors (and the execution hasn't been stopped), this won't impact your analysis and there's no need to correct it.

  2. For your second point, you raised a very good question. This is an aspect I may consider adding as a new feature in the next version of ggpicrust2.

  3. Regarding your third point, you should set x_lab to "description". If ko_to_kegg=FALSE, then the only options for x_lab are "description" and "feature".

  4. In relation to your fourth point, the ggpicrust2() function does not support LEfSe. Moreover, LEfSe is statistically a very poor method, and I am considering removing it from the ggpicrust2 package. Instead, I recommend using the LinDA method.

I hope these suggestions help. If you have further questions, please do not hesitate to ask.

Best regards, Chen YANG

kriegerm commented 1 year ago

@cafferychen777 Thank you so much for your quick and detailed reply!!

I've got one more question - my picrust2 gave me 3 output folders: EC_metagenome_out, KO_metagenome_out, and pathways_out. I'd like to analyze the output of all three of these, but so far have just used the KO_metagenome_out files. I'm having some trouble figuring out what settings to change in the ggpicrust2() command in order to use the EC and pathway file types.

Thank you!

cafferychen777 commented 1 year ago

Hello @Krieger,

Regarding your question, you mentioned that your picrust2 analysis generated three output folders: EC_metagenome_out, KO_metagenome_out, and pathways_out. You have been analyzing the KO_metagenome_out files, but now you would like to analyze the output of all three folders. However, you're facing difficulties in determining the settings to change in the ggpicrust2() command to incorporate the EC and pathway file types.

To address this, I recommend referring to the tutorial available at https://github.com/cafferychen777/ggpicrust2. Specifically, you can find guidance on adjusting the settings for the EC and pathway file types in the relevant section. Please navigate to "https://github.com/cafferychen777/ggpicrust2#if-an-error-occurs-with-ggpicrust2-please-use-the-following-workflow" for more information.

Furthermore, I encourage you to thoroughly read the entire tutorial, as it provides a comprehensive understanding of ggpicrust2 and its functionalities.

If you have any further questions or need additional assistance, please don't hesitate to ask. Good luck with your analysis!

Best regards, Chen Yang

kriegerm commented 1 year ago

Thank you for your help and suggestions! I will look into those resources.