pathway_daa(): Error in metadata[, matching_columns]: ! Can't subset columns with matching_columns. ✖ Subscript matching_columnscan't contain missing values. ✖ It has a missing value at location 1. Runrlang::last_trace()

hello, I am trying to follow the example code with my data. I used the file path_abun_unstrat.tsv and a file titled Metadata.txt When I try to run the ggpicrust with the input file path, I get an error stating that the subscript matching columns can't contain missing values, It has a missing value at location 1. Here is what the code Looks like

and the error that is given is Calculation may take a long time, please be patient. The kegg pathway with zero abundance in all the different samples has been removed. Performing pathway differential abundance analysis... Error inmetadata[, matching_columns]: ! Can't subset columns withmatching_columns. ✖ Subscriptmatching_columnscan't contain missing values. ✖ It has a missing value at location 1. Runrlang::last_trace()` to see where the error occurred.

I went through the step by step approach in the read me and this is the command where the error is popping up daa_results_df <- pathway_daa(abundance = kegg_abundance, metadata = metadata, group = "ENVIRONMENT", daa_method = "ALDEx2", select = NULL, reference = NULL)

I found another post that had the same error as me, and I tried to follow the advice there and the issue is still not resolving. I am not sure if I am understanding the solutions correctly. I transposed my metadata file so that the sample name was the columns, matching the kegg_abundance, The first column in my meta data is the sample_name with the following groups I am putting my data in. That column does not show up on the kegg abundance columns and I think that might be the issue? But then I lose my samples groupings. I am unsure how to make them match.

Hello,

It seems that there might be an issue with the data structure of your metadata. The metadata should be a tibble, and I noticed that you mentioned the sample name should be a column in the tibble.

To resolve the error you encountered, which states that the subscript matching_columns can't contain missing values, with a missing value at location 1, please make sure that your metadata is properly structured. Here are a few steps you can follow:

Confirm that your metadata is in tibble format. If it's not, you can convert it to a tibble using the as_tibble() function. For example: metadata <- as_tibble(metadata).
Check the structure of your metadata tibble and ensure that the sample name column exists. You can use the str() function to examine the structure of your metadata, like this: str(metadata). Make sure that the first column contains the sample names and that the following columns represent the corresponding groups.
Verify that the sample names in your metadata tibble match the column names in the kegg_abundance object. You can use the colnames() function to compare the column names, like this: metadata[,sample_name] and colnames(kegg_abundance).

If the column names in your metadata and kegg_abundance do not match, you need to ensure that the sample names are correctly aligned. However, please note that transposing the metadata and making the sample name a column might not be the appropriate solution, as it may result in the loss of sample grouping information.

Additionally, I noticed that you are using an older version of ggpicrust2. To ensure you have the latest bug fixes and improvements, I recommend upgrading to the newest version available on GitHub using the following command: devtools::install_github('cafferychen777/ggpicrust2').

If you have followed these steps and are still encountering the issue, it would be helpful for further debugging if you could provide the dataset you are working with. By examining the actual data, I can better understand the structure and identify any potential issues that may be causing the error. Please provide the path_abun_unstrat.tsv and Metadata.txt files or any relevant sample data that can help in reproducing the problem.

Once I have access to the dataset, I will be able to assist you more effectively in resolving the issue. Please attach the files or provide a link to the dataset if possible.

Please let me know if you need any further assistance!

Hello @mayagabitzsch ,

Thank you for your input. It's indeed possible that the issue stems from the lack of one-to-one correspondence and alignment between the colnames(kegg_abundance) and the sample names in the metadata. Please double-check this aspect to ensure proper matching.

I apologize for any confusion caused, but you're right that this issue is not directly related to the ggpicrust2 package itself. It's more likely a data mismatch or alignment problem between your abundance data and metadata.

Please review the column names in the kegg_abundance object and compare them with the sample names in the metadata. Ensure that they align correctly and correspond to each other accurately. It's crucial that the sample names in the metadata match the column names in the kegg_abundance object precisely.

If you find any discrepancies or misalignments, please make the necessary adjustments to ensure the proper alignment between the two. This should help resolve the error you encountered.

If you have any further questions or need additional assistance, please let me know.

@cafferychen777 Thanks so much for the prompt response,

Metadata.txt path_abun_unstrat copy.txt pred_metagenome_unstrat copy.txt

I tried the solutions you have offered. Unfortunately I cannot get the error to resolve. I have attached both files, I had to convert my abundance tsv to text otherwise github will not let me attach the file in a post. I am not sure if you can still work with that file. Please let me know what you find!

Hello @mayagabitzsch ,

Thank you for the files you have shared. I have looked into them and found that the sample names in the abundance file do not correspond with those in the metadata.

Specifically, by running setdiff(metadata$sample_name, sample_names) in R, I can see that "RHA2R-16S" is present in the metadata but not in the abundance file.

Here is the output:

setdiff(metadata$sample_name, sample_names)
[1] "RHA2R-16S"

For your reference, here is the list of sample names in the metadata:

metadata$sample_name
 [1] "KGS1R-16S"  "KGS1S-16S"  "KGS2R-16S"  "KGS2S-16S"  "KGS3S-16S"  "OBS1R-16S"  ...
[...]
[57] "RHA2R-16S"  "RHA3R-16S"  "RHQ2R-16S"  "RHY3R-16S"  ...

And here is the list of sample names in the abundance file:

sample_names
[1] "KGS1R-16S"  "KGS1S-16S"  "KGS2R-16S"  "KGS2S-16S"  "KGS3S-16S"  "OBS1R-16S"  ...
[...]
[57] "RHA3R-16S"  "RHQ2R-16S"  "RHY3R-16S"  ...

This discrepancy might be causing the error you are experiencing. Please double-check the sample names in both files to ensure they match correctly.

Let me know if you need further assistance.

Best regards

And when you are working with microbiome data using the ggpicrust2 library and you have multiple groups to analyze, it is recommended to use the ALDEx2 method for performing pathway differential abundance analysis.

Thank you very much, I was able to get that error resolved.

the line ran, but I also go this message. KO to KEGG conversion completed. Time elapsed: 2.81 seconds. Removing KEGG pathways with zero abundance across all samples... KEGG abundance calculation completed successfully. Error in exists(cacheKey, where = .rs.WorkingDataEnv, inherits = FALSE) : invalid first argument Error in assign(cacheKey, frame, .rs.CachedDataEnv) : attempt to use zero-length variable name

when I went to ran the next line of code, daa_results_df <- pathway_daa(abundance = kegg_abundance, metadata = metadata, group = "ENVIRONMENT", daa_method = "ALDEx2", select = NULL, reference = NULL) I get this output Sample names extracted. Identifying matching columns in metadata... Matching columns identified: sample_name . This is important for ensuring data consistency. Using all columns in abundance. Converting abundance to a matrix... Reordering metadata... Converting metadata to a matrix and data frame... Extracting group information... Running ALDEx2 with multiple groups. This might take some time, please wait patiently... operating in serial mode computing center with all features operating in serial mode ALDEx2 analysis with multiple groups complete.

Does that mean everything is fine? Or will that error I received in the KEGG conversion show up as an issue somewhere else?

Thank you for reaching out. This is an automated response to inform you that I am currently unavailable to address any issues regarding ggpicrust2, as I am in the midst of taking final exams for several important courses.

Please note that ggpicrust2 is a very complete and well-developed package with comprehensive tutorials. Most issues encountered are likely due to user errors in operation. I would strongly encourage you to consult the available tutorials, as they are designed to guide you through the correct usage and troubleshooting.

I apologize for any inconvenience this may cause. My exams will conclude on the 29th, and I will be able to address any remaining issues after the 30th.

Thank you for your understanding and patience.

Best regards,

Chen

On Thu, 22 Jun 2023 at 19:13, mayagabitzsch @.***> wrote:

Hello, I have been trying to make an error bar of my metacyc abundance, but even with the guide I cannot fix this issue `metacyc_abunance <- read.table("/Desktop/microbiome/path_abun_unstrat.tsv", sep = "\t", header = TRUE) metadata <- read_delim("/Desktop/microbiome/meta2_OG.txt", delim = "\t", escape_double = FALSE, trim_ws = TRUE)

metacyc_daa_results_df <- pathway_daa(abundance = metacyc_abunance %>% column_to_rownames("pathway"), metadata = metadata, group = "SPECIES", daa_method = "ALDEx2") metacyc_daa_annotated_results_df <- pathway_annotation(pathway = "MetaCyc", daa_results_df = metacyc_daa_results_df, ko_to_kegg = FALSE)

p <- ggpicrust2::pathway_errorbar( abundance = metacyc_abunance, daa_results_df = metacyc_daa_annotated_results_df, Group = metadata$SPECIES, p_values_threshold = 0.05, order = "pathway_class", select = c("P241-PWY", "PWY-5531", "PWY-6737", "PWY-7159", "GLYCOLYSIS-TCA-GLYOX-BYPASS", "ASPASN-PWY", "GLYCOLYSIS-E-D", "PWY-5484", "PWY-5505", "PWY490-3", "TCA-GLYOX-BYPASS", "ANAEROFRUCAT-PWY", "COBALSYN-PWY", "P124-PWY", "PWY-5913", "ARG+POLYAMINE-SYN", "GLYCOLYSIS", "PWY-5529", "SO4ASSIM-PWY", "P105-PWY"), ko_to_kegg = FALSE, p_value_bar = FALSE, colors = NULL, x_lab = "description" )`

I tried to do both solutions listed, but only taking the top 20 with the smallest adjusted p values, and it would not work. I exported the table to find the top 20 myself, manually put them in select, and still I receive this error.

There are more than one method in daa_results_df$method, please filter it. Error in ggpicrust2::pathway_errorbar(abundance = metacyc_abunance, daa_results_df = metacyc_daa_annotated_results_df, : The feature with statistically significance are more than 30, the visualization will be terrible. Please use select to reduce the number. Now you have "ANAEROFRUCAT-PWY", "ARG+POLYAMINE-SYN", "ASPASN-PWY", "COBALSYN-PWY", "GLYCOLYSIS", "GLYCOLYSIS-E-D", "GLYCOLYSIS-TCA-GLYOX-BYPASS", "P105-PWY", "P124-PWY", "P241-PWY", "PWY-5484", "PWY-5505", "PWY-5529", "PWY-5531", "PWY-5913", "PWY-6737", "PWY-7159", "PWY490-3", "SO4ASSIM-PWY", "TCA-GLYOX-BYPASS", "ANAEROFRUCAT-PWY", "ARG+POLYAMINE-SYN", "ASPASN-PWY", "COBALSYN-PWY", "GLYCOLYSIS", "GLYCOLYSIS-E-D", "GLYCOLYSIS-TCA-GLYOX-BYPASS", "P105-PWY", "P124-PWY", "P241-PWY", "PWY-5484", "PWY-5505", "PWY-5529", "PWY-5531", "PWY-5913", "PWY-6737", "PWY-7159", "PWY490-3", "SO4ASSIM-PWY", "TCA-GLYOX-BYPASS"

I have also attached my files that I am using. Thank you again meta2_OG.txt https://github.com/cafferychen777/ggpicrust2/files/11833248/meta2_OG.txt path_abun_unstrat copy.txt https://github.com/cafferychen777/ggpicrust2/files/11833259/path_abun_unstrat.copy.txt

— Reply to this email directly, view it on GitHub https://github.com/cafferychen777/ggpicrust2/issues/35#issuecomment-1602457668, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATZEQTSBG5RKQVDVKNMURODXMQSE3ANCNFSM6AAAAAAZL2S6YY . You are receiving this because you were mentioned.Message ID: @.***>

cafferychen777 / ggpicrust2

pathway_daa(): Error in metadata[, matching_columns]: ! Can't subset columns with matching_columns. ✖ Subscript matching_columnscan't contain missing values. ✖ It has a missing value at location 1. Runrlang::last_trace() #35