cafferychen777 / ggpicrust2

Make Picrust2 Output Analysis and Visualization Easier
https://cafferychen777.github.io/ggpicrust2/
Other
91 stars 11 forks source link

undefined columns selected and ‘round’ not meaningful for factors #102

Open antonkratz opened 3 months ago

antonkratz commented 3 months ago

I can run ggpicrust2 with the provided example data (but not plot it, made a separate entry for that #101) but I am struggling to get it work with my own data.

Here is my code

library(readr)
library(ggpicrust2)
library(tibble)
library(tidyverse)
library(ggprism)
library(patchwork)
library(ggpicrust2)

metadata <- read_delim("/home/kratz/my_meta.tsv",
delim = "\t",
    escape_double = FALSE,
    trim_ws = TRUE)

abundance_data <- read_delim("/home/kratz/path_abun_unstrat.tsv",
    delim = "\t",
    col_names = TRUE,
    trim_ws = TRUE)

results_file_input <- ggpicrust2(data = abundance_data,
                                 metadata = metadata,
                                 group = "biological_sex",
                                 pathway = "MetaCyc",
                                 daa_method = "LinDA",
                                 ko_to_kegg = FALSE,
                                 order = "pathway_class",
                                 p_values_bar = TRUE,
                                 x_lab = "pathway_name")

However this to Error in[.data.frame(daa_results_df, , x_lab) : undefined columns selected!

Starting the ggpicrust2 analysis...

Reading input file or using provided data...

Performing pathway differential abundance analysis...

Sample names extracted.
Identifying matching columns in metadata...
Matching columns identified: sample_name . This is important for ensuring data consistency.
Using all columns in abundance.
Converting abundance to a matrix...
Reordering metadata...
Converting metadata to a matrix and data frame...
Extracting group information...
Running LinDA analysis...
Performing LinDA analysis...
0  features are filtered!
The filtered data has  118  samples and  389  features will be tested!
Pseudo-count approach is used.
Fit linear models ...
Completed.
Processing LinDA results...
LinDA analysis is complete.
Annotating pathways...

Starting pathway annotation...
DAA results data frame is not null. Proceeding...
KO to KEGG is set to FALSE. Proceeding with standard workflow...
Loading MetaCyc reference data...
Returning DAA results data frame...
Creating pathway error bar plots...

Error in `[.data.frame`(daa_results_df, , x_lab) : 
  undefined columns selected
In addition: Warning message:
In MicrobiomeStat::linda(abundance, LinDA_metadata_df, formula = "~Group_group_nonsense_",  :
  Some features have less than 3 nonzero values! 
                                                They have virtually no statistical power. You may consider filtering them in the analysis!

Therefore, I follow the step-by-step approach, start a new R session, load the same libraries and then:

kegg_abundance <- ko2kegg_abundance("/home/kratz/path_abun_unstrat.tsv")
metadata <- read_delim("/home/kratz/my_meta.tsv", delim = "\t", escape_double = FALSE, trim_ws = TRUE)
daa_results_df <- pathway_daa(abundance = kegg_abundance, metadata = metadata, group = "biological_sex", daa_method = "ALDEx2", select = NULL, reference = NULL)

Which results in:

Sample names extracted.
Identifying matching columns in metadata...
Matching columns identified: sample_name . This is important for ensuring data consistency.
Using all columns in abundance.
Converting abundance to a matrix...
Reordering metadata...
Converting metadata to a matrix and data frame...
Extracting group information...
Running ALDEx2 with two groups. Performing t-test...
Error in Math.factor(c(2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L,  :
  ‘round’ not meaningful for factors

I made extra sure that the entries in metadata, first column, precisely match the column names of the actual data frame

Please help.

I am using R version 4.3.1

cafferychen777 commented 3 months ago

Dear Anton,

Thank you for reaching out and providing detailed information about the issues you're encountering with ggpicrust2. It seems like there might be some inconsistencies or specific characteristics in your data that are causing these errors.

To better assist you, would it be possible for you to send your data (both the metadata and abundance data) to my email at cafferychen7850@gmail.com? This will allow me to take a closer look and potentially identify the root cause of the problems.

Please ensure that any sensitive information is removed or anonymized before sending the data. I appreciate your cooperation and look forward to helping you resolve these issues.

Best regards, Caffery Yang

cafferychen777 commented 3 months ago

Dear @antonkratz,

Thank you for your patience. After reviewing your data and the errors you encountered, I have identified a solution for the issues you reported with ggpicrust2.

Regarding the "‘round’ not meaningful for factors" error, it seems like this issue is related to the ALDEx2 method. As a temporary workaround, you could try using a different differential abundance analysis (DAA) method, such as "DESeq2" or "edgeR", to see if the issue persists. Alternatively, you could try the solution provided by another user, which involves installing an older version of ALDEx2 (v.1.28) from the Bioconductor archive.

Please try these suggestions and let me know if they resolve the issues. If you continue to encounter problems, feel free to reach out again, and I'll be happy to assist further.

Best regards, Caffery Yang

jrhaulung commented 3 weeks ago

Thank you for your quick reply!

The error occurs with ALDEx2_1.34.0 but also with ALDEx2_1.28.0.