cafferychen777 / ggpicrust2

Make Picrust2 Output Analysis and Visualization Easier
https://cafferychen777.github.io/ggpicrust2/
MIT License
108 stars 12 forks source link

pathway_errorbar error #117

Open DeniRibicic opened 2 months ago

DeniRibicic commented 2 months ago

Hi,

First of all, thanks for creating this tool. It seems very neat for crunching down picrust2 output.

I am trying to run pathway_errorbar function but getting the following error:

Error in $<-.data.frame(*tmp*, "group", value = c(2L, 2L, 2L, 2L,  : 
  replacement has 205 rows, data has 185

4. stop(sprintf(ngettext(N, "replacement has %d row, data has %d",
"replacement has %d rows, data has %d"), N, nrows), domain = NA)

3. $<-.data.frame(*tmp*, "group", value = structure(c(2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, ...

2. $<-(*tmp*, "group", value = structure(c(2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, ...

1. pathway_errorbar(abundance = kegg_abundance, daa_results_df = daa_annotated_results_df,
Group = metadata$exposure, p_values_threshold = 0.05, order = "pathway_class",
select = daa_annotated_results_df %>% arrange(p_adjust) %>%
slice(1:20) %>% select("feature") %>% pull(), ko_to_kegg = TRUE, ...`

I have used following commands:

ko_abundance <- read_delim("/path/to/ko_feature-table.tsv", delim = "\t")`

metadata <- read_delim(
    "/path/to/Mapping_file-H2S.txt",
    delim = "\t",
    escape_double = FALSE,
    trim_ws = TRUE
)

kegg_abundance <- ko2kegg_abundance(data = ko_abundance)

daa_results_df <- pathway_daa(abundance = kegg_abundance, metadata = metadata, group = "exposure", daa_method = "LinDA")

#subset to more chunks if df too big- otherwise no annotation will be performed
sub1 <- daa_results_df[1:86, ]
sub2 <- daa_results_df[87:172, ]
sub3 <- daa_results_df[173:258, ]

daa_ann1 <- pathway_annotation(pathway = "KO", daa_results_df = sub1, ko_to_kegg = TRUE)
daa_ann2 <- pathway_annotation(pathway = "KO", daa_results_df = sub2, ko_to_kegg = TRUE)
daa_ann3 <- pathway_annotation(pathway = "KO", daa_results_df = sub3, ko_to_kegg = TRUE)

daa_annotated_results_df <- rbind(daa_ann1, daa_ann2, daa_ann3)

p <- pathway_errorbar(
  abundance = kegg_abundance,
  daa_results_df = daa_annotated_results_df,
  Group = metadata$exposure,
  p_values_threshold = 0.05,
  order = "pathway_class",
  select = daa_annotated_results_df %>%
  arrange(p_adjust) %>%
  slice(1:20) %>%
  select("feature") %>% pull(),
  ko_to_kegg = TRUE,
  p_value_bar = TRUE,
  colors = NULL,
  x_lab = "pathway_name"
)

All the outputs are generated as they supposed to, it is only the last part that is throwing the error. When I run the exact same commands on your data (except the sub setting), everything works fine.

Any idea what could be wrong here? I can share my data if needed.

cafferychen777 commented 1 month ago

Dear DeniRibicic,

Thank you for reaching out and for your interest in the ggpicrust2 tool. I appreciate your feedback and am glad to hear that you find it useful for analyzing PICRUSt2 output.

I've reviewed the error message you're encountering with the pathway_errorbar function. The error suggests a mismatch between the number of rows in your data and the replacement values, which is unexpected given the steps you've taken. Theoretically, the process you've described should work correctly.

To better understand and resolve this issue, it would be helpful if you could send your data to cafferychen777@tamu.edu. This will allow me to examine the specific structure and content of your datasets, which might reveal the source of the problem.

When sending the data, please include:

  1. Your ko_abundance file
  2. Your metadata file
  3. The resulting kegg_abundance object
  4. The daa_results_df and daa_annotated_results_df objects

With these, I should be able to reproduce the issue and provide a more targeted solution.

In the meantime, could you also confirm:

Thank you for your patience as we work to resolve this. I look forward to receiving your data and helping you get the pathway_errorbar function working correctly.

Best regards, Chen Yang

DeniRibicic commented 1 month ago

Hi @cafferychen777 and thank you for prompt reply.

As you mentioned dimensions here, I checked it and that was the problem!

Basically, my metadata file contains some additional samples that were not part of picrust2 analysis. Usually having listed more samples in metadata/mapping file doesn't cause errors with other microbiome packages that I use (it is rather the opposite- missing sample). However, with ggpicrust2 this is the case.

Quick subsetting solves the issue:

samples <- colnames(ko_abundance)[-1]
metadata_filtered <- metadata %>% filter(sampleid %in% abundance)
cafferychen777 commented 1 month ago

Dear @DeniRibicic,

Thank you for the update on your issue with the pathway_errorbar function. I'm glad to hear that you were able to identify and resolve the problem.

It's great that you discovered the mismatch between your metadata file and the PICRUSt2 analysis samples. You're right that having additional samples in the metadata file can sometimes cause issues, even if it doesn't with other microbiome packages. Your solution of subsetting the metadata to match the samples in the abundance data is perfect:

samples <- colnames(ko_abundance)[-1]
metadata_filtered <- metadata %>% filter(`#SampleID` %in% samples)

This approach ensures that your metadata and abundance data are properly aligned, which is crucial for the correct functioning of ggpicrust2.

Thank you for sharing this solution. It will be helpful for other users who might encounter similar issues in the future. If you have any more questions or run into any other problems, please don't hesitate to ask.

Best regards, Chen Yang

DeniRibicic commented 1 month ago

Hi again,

Running the same function on MetaCyc dataset, but getting the following error:

Error in pathway_errorbar(abundance = metacyc_abundance %>% column_to_rownames("#OTUID"),  : 
  Visualization with 'pathway_errorbar' cannot be performed because there are no features with statistical significance. For possible solutions, please check the FAQ section of the tutorial.

  2. stop("Visualization with 'pathway_errorbar' cannot be performed because there are no features with statistical significance. ",
"For possible solutions, please check the FAQ section of the tutorial.")

  1. pathway_errorbar(abundance = metacyc_abundance %>% column_to_rownames("#OTUID"),
daa_results_df = metacyc_daa_annotated_results_df, Group = metadata_filtered$exposure,
ko_to_kegg = FALSE, p_values_threshold = 0.05, order = "group",
select = daa_annotated_results_df %>% arrange(p_adjust) %>% ...

Full command to create inputs:

metacyc_daa_results_df <- pathway_daa(abundance = metacyc_abundance %>% column_to_rownames("#OTUID"), metadata = metadata_filtered, group = "exposure", daa_method = "LinDA")
metacyc_daa_annotated_results_df <- pathway_annotation(pathway = "MetaCyc", daa_results_df = metacyc_daa_results_df, ko_to_kegg = FALSE)

p <- pathway_errorbar(abundance = metacyc_abundance %>% column_to_rownames("#OTUID"),
           daa_results_df = metacyc_daa_annotated_results_df,
           Group = metadata_filtered$exposure,
           ko_to_kegg = FALSE,
           p_values_threshold = 0.05,
           order = "group",
           select = daa_annotated_results_df %>% 
             arrange(p_adjust) %>% 
             dplyr::slice(1:30) %>% 
             select("feature") %>% 
             pull(),
           p_value_bar = TRUE,
           colors = NULL,
           x_lab = "description")

No errors when creating these two input dfs.

p_adjust is numeric and there are bunch of significant features.

Any idea from top of your head why is it complaining there are no significant features?

cafferychen777 commented 1 month ago

Dear DeniRibicic,

Thank you for your detailed follow-up on the issue. Theoretically, this problem should not occur given the information you've provided. To help diagnose the issue, it would be very helpful if you could send me the relevant R data objects (such as metacyc_abundance, metacyc_daa_annotated_results_df, and metadata_filtered) as an RData file. This will allow me to reproduce the problem and investigate further.

In the meantime, you might want to try a few troubleshooting steps:

  1. Restart your R session. Sometimes, unexplained errors can be resolved by simply restarting the R environment.

  2. Make sure all your packages, including ggpicrust2, are up to date.

  3. Double-check that the p_adjust column in your metacyc_daa_annotated_results_df contains numeric values and that there are indeed statistically significant features (p_adjust < 0.05).

  4. Verify that the dimensions and sample names in your abundance data, metadata, and results dataframe all match correctly.

If the issue persists after trying these steps, please send me the RData file, and I'll be happy to take a closer look.

Best regards, Chen Yang

cafferychen777 commented 1 month ago

Hi DeniRibicic,

Thank you for sharing the details of the error. After reviewing the code, I found the issue.

In the command:

select = daa_annotated_results_df %>% 
             arrange(p_adjust) %>% 
             slice(1:30) %>% 
             select("feature") %>% 
             pull()

You are using daa_annotated_results_df, but since you're working with the MetaCyc dataset, you need to update this to metacyc_daa_annotated_results_df. Once you make this change, the function should run successfully.

Let me know if you encounter any further issues!

Best regards,
Caffery Yang

Screenshot 2024-09-06 at 9 03 31 AM
DeniRibicic commented 1 month ago

@cafferychen777 such a rookie mistake! Thanks, this works!