pathway_daa(): Error in metadata[, matching_columns]: ! Can't subset columns with matching_columns. ✖ Subscript matching_columns can't contain missing values. ✖ It has a missing value at location 1.

Hello. I receive an error when loading my metadata for the analysis. My metadata is perfectly okay.

Here is my metadata
sample_name TB_history P01-A01-HLM069 0 P01-A02-HLM002 0 P01-A03-HLM038 0 P01-A04-HLM091 0 P01-A05-HLM061 0 P01-A06-HLM094 0 P01-A07-HLM102 0 P01-A08-HLM056 0 P01-A09-HLM053 0 P01-A10-HLM019 0 P01-A11-HLM137 0 P01-A12-HLM131 0 P01-B01-HLM062 0 P01-B02-HLM064 0 P01-B03-HLM037 0 P01-B04-HLM089 0 P01-B05-HLM107 0 P01-B06-HLM097 0 P01-B07-HLM098 0 P01-B08-HLM106 0 P01-B09-HLM054 0 P01-B10-HLM020 0 P01-B11-HLM136 0 P01-B12-HLM122 0 P01-C01-HLM080 0 P01-C02-HLM044 0 P01-C03-HLM045 0 P01-C04-HLM074 0 P01-C05-HLM114 0 P01-C06-HLM084 0 P01-C07-HLM104 0 P01-C08-HLM096 0 P01-C09-HLM035 0 P01-C10-HLM021 0 P01-C11-HLM117 0 P01-C12-HLM128 0 P01-D01-HLM079 0 P01-D02-HLM043 0 P01-D03-HLM025 0 P01-D04-HLM073 0 P01-D05-HLM090 0 P01-D06-HLM093 0 P01-D07-HLM105 0 P01-D08-HLM086 0 P01-D09-HLM055 0 P01-D10-HLM022 0 P01-D11-HLM101 0 P01-D12-HLM135 0 P01-E01-HLM065 0 P01-E02-HLM042 0 P01-E03-HLM049 0 P01-E05-HLM095 0 P01-E06-HLM100 0 P01-E07-HLM103 0 P01-E08-HLM112 0 P01-E09-HLM057 0 P01-E10-HLM023 0 P01-E11-HLM120 0 P01-E12-HLM138 0 P01-F01-HLM066 0 P01-F02-HLM041 0 P01-F03-HLM048 0 P01-F05-HLM092 0 P01-F06-HLM081 0 P01-F07-HLM087 0 P01-F08-HLM050 0 P01-F09-HLM058 0 P01-F10-HLM024 0 P01-F11-HLM116 0 P01-F12-HLM140 0 P01-G01-HLM068 0 P01-G03-HLM067 0 P01-G04-HLM109 0 P01-G05-HLM083 0 P01-G06-HLM075 0 P01-G07-HLM072 0 P01-G08-HLM051 0 P01-G09-HLM059 0 P01-G10-HLM018 0 P01-G11-HLM123 0 P01-G12-HLM178 0 P01-H01-HLM063 0 P01-H02-HLM039 0 P01-H03-HLM070 0 P01-H04-HLM110 0 P01-H05-HLM082 0 P01-H06-HLM078 0 P01-H07-HLM099 0 P01-H08-HLM052 0 P01-H09-HLM029 0 P01-H10-HLM017 0 P01-H11-HLM124 0 P01-H12-HLM148 0 P02-A01-HLM146 0 P02-A02-HLM016 0 P02-A03-HLM171 0 P02-A04-HLM004 0 P02-A05-HLM167 0 P02-A06-HLM172 0 P02-A07-HLM154 0 P02-A08-HLM085 0 P02-A09-HLM134 0 P02-A10-HLM405 1 P02-A11-HLM413 1 P02-A12-HLM421 1 P02-B01-HLM145 0 P02-B02-HLM177 0 P02-B03-HLM032 0 P02-B04-HLM001 0 P02-B05-HLM165 0 P02-B06-HLM013 0 P02-B07-HLM155 0 P02-B08-HLM003 0 P02-B09-HLM130 0 P02-B10-HLM406 1 P02-B11-HLM414 1 P02-B12-HLM422 1 P02-C01-HLM143 0 P02-C02-HLM182 0 P02-C03-HLM033 0 P02-C04-HLM006 0 P02-C05-HLM129 0 P02-C06-HLM010 0 P02-C07-HLM157 0 P02-C08-HLM111 0 P02-C09-HLM127 0 P02-C10-HLM407 1 P02-C11-HLM415 1 P02-C12-HLM423 1 P02-D01-HLM115 0 P02-D02-HLM186 0 P02-D03-HLM030 0 P02-D04-HLM173 0 P02-D05-HLM166 0 P02-D06-HLM189 0 P02-D07-HLM158 0 P02-D08-HLM005 0 P02-D09-HLM400 1 P02-D10-HLM408 1 P02-D11-HLM416 1 P02-D12-HLM424 1 P02-E01-HLM144 0 P02-E02-HLM184 0 P02-E03-HLM014 0 P02-E04-HLM174 0 P02-E05-HLM149 0 P02-E06-HLM190 0 P02-E07-HLM159 0 P02-E08-HLM132 0 P02-E09-HLM401 1 P02-E10-HLM409 1 P02-E11-HLM417 1 P02-E12-HLM425 1 P02-F01-HLM179 0 P02-F02-HLM187 0 P02-F03-HLM027 0 P02-F04-HLM175 0 P02-F05-HLM163 0 P02-F06-HLM126 0 P02-F07-HLM161 0 P02-F08-HLM007 0 P02-F09-HLM402 1 P02-F10-HLM410 1 P02-F11-HLM418 1 P02-F12-HLM426 1 P02-G01-HLM142 0 P02-G02-HLM188 0 P02-G03-HLM036 0 P02-G04-HLM147 0 P02-G05-HLM164 0 P02-G06-HLM141 0 P02-G07-HLM118 0 P02-G08-HLM008 0 P02-G09-HLM403 1 P02-G10-HLM411 1 P02-G11-HLM419 1 P02-G12-HLM427 1 P02-H01-HLM180 0 P02-H02-HLM176 0 P02-H03-HLM031 0 P02-H04-HLM162 0 P02-H05-HLM009 0 P02-H06-HLM152 0 P02-H07-HLM119 0 P02-H08-HLM011 0 P02-H09-HLM404 1 P02-H10-HLM412 1 P02-H11-HLM420 1 P02-H12-HLM428 1 P03-A01-HLM429 1 P03-B01-HLM430 1 P03-C01-HLM431 1 P03-D01-NEC 0

Here is the error I get Error in metadata[, matching_columns]: ! Can't subset columns with matching_columns. ✖ Subscript matching_columns can't contain missing values. ✖ It has a missing value at location 1. Backtrace:

ggpicrust2::pathway_daa(...)
1. vctrs (local) <fn>()
2. vctrs:::stop_subscript_missing(i = i, call = call)

Hello @GeoffreyOlweny ,

Thank you for reaching out with details about the issue you've encountered when loading your metadata for analysis with ggpicrust2.

Based on the error message you've provided, the issue seems to stem from a mismatch between the sample names in your metadata and those in your abundance data. The error indicates that some sample names present in your metadata aren't found in the abundance data, which can lead to this kind of problem.

Interestingly, a similar issue was reported by another user some time ago. I've provided a detailed response on how to address this issue in that context. I recommend you refer to the solution provided in this GitHub issue for guidance. It should help you resolve the problem you're currently facing.

If you still encounter issues after trying the suggested solution, please let me know, and I'd be more than happy to assist you further.

Best regards,

Chen YANG

Thank you very much. Just one more question. It seems the package doesn't allow for any manipulation of the kegg_abundance table as it gives an error incase you do any manipulation. of the KO abundance tables irrespective of the sample_names still matching with those of the metadata.

On Fri, 22 Sep 2023 at 16:32, Caffery Yang @.***> wrote:

Hello @GeoffreyOlweny https://github.com/GeoffreyOlweny ,

Thank you for reaching out with details about the issue you've encountered when loading your metadata for analysis with ggpicrust2.

Based on the error message you've provided, the issue seems to stem from a mismatch between the sample names in your metadata and those in your abundance data. The error indicates that some sample names present in your metadata aren't found in the abundance data, which can lead to this kind of problem.

Interestingly, a similar issue was reported by another user some time ago. I've provided a detailed response on how to address this issue in that context. I recommend you refer to the solution provided in this GitHub issue https://github.com/cafferychen777/ggpicrust2/issues/35 for guidance. It should help you resolve the problem you're currently facing.

If you still encounter issues after trying the suggested solution, please let me know, and I'd be more than happy to assist you further.

Best regards,

Chen YANG

— Reply to this email directly, view it on GitHub https://github.com/cafferychen777/ggpicrust2/issues/58#issuecomment-1731427350, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXTMMUALBOYIIBOK6TB6IE3X3WHOZANCNFSM6AAAAAA5DBSVKU . You are receiving this because you were mentioned.Message ID: @.***>

Hi @GeoffreyOlweny ,

Thank you for your question. It appears that there is an issue with manipulating the kegg_abundance table in ggpicrust2. This issue occurs even when you perform operations on the KO (KEGG Orthology) abundance tables, as long as the sample names still match those in the metadata.

To clarify, it seems that any attempt to modify the KO abundance tables, even if the sample names are consistent with the metadata, results in an error.

To provide a clearer understanding and potentially address this issue, it would be helpful if you could provide some example code or specific steps that you've taken that resulted in the error. This way, I can offer more targeted assistance and suggestions for resolving the problem.

Please feel free to share any additional details you have, and I'll do my best to assist you further.

Best regards, Chen YANG

Hei,

I have experienced the same problem as described and cannot get it resolved with any of the tips that have been given here or in the related issues. I get the same error for the pathway_daa():

Sample names extracted. Identifying matching columns in metadata... Matching columns identified: NA . This is important for ensuring data consistency. Using all columns in abundance. Converting abundance to a matrix... Reordering metadata... Error in metadata[, matching_columns]: ! Can't subset columns with matching_columns. ✖ Subscript matching_columns can't contain missing values. ✖ It has a missing value at location 1. Run rlang::last_trace() to see where the error occurred.

And using rlang::last_trace(): <error/vctrs_error_subscript_type> Error in `metadata[, matching_columns]`: ! Can't subset columns with `matching_columns`. ✖ Subscript `matching_columns` can't contain missing values. ✖ It has a missing value at location 1.

Backtrace: ▆

├─ggpicrust2::pathway_daa(...)
│ ├─base::as.matrix(metadata[, matching_columns])
│ ├─metadata[, matching_columns]
│ └─tibble:::[.tbl_df(metadata, , matching_columns)
│ └─tibble:::vectbl_as_col_location(...)
│ ├─tibble:::subclass_col_index_errors(...)
│ │ └─base::withCallingHandlers(...)
│ └─vctrs::vec_as_location(j, n, names, missing = "error", call = call)
└─vctrs (local) <fn>()
1. └─vctrs:::stop_subscript_missing(i = i, call = call) Run rlang::last_trace(drop = FALSE) to see 1 hidden frame.

I have made sure that the metadata is a tibble, the sample names are identical in the ko-abundance file and the metadata file and there are no extra or missing sample names in either. I have attached a subset of the data if you want to take a look. I appreciate any help you could give me with this.

BR, Katharina metadata.txt pred_metagenome_unstrat.txt

Dear @katharinakujala,

Thank you for reaching out with the issue you've encountered. I have gone through the process with the provided data and did not face any problems. Below is a sequence of code that I have used successfully. Please ensure you replace the file paths with the actual locations of your files on your system. Also, note that if no statistically significant differences are found between groups in the pathway_daa results, you won't be able to visualize them using pathway_errorbar. However, you can still utilize pathway_pca and pathway_heatmap for visualization purposes. Here is the code for you to try:

library(readr)
library(ggpicrust2)
library(tibble)
library(tidyverse)
library(ggprism)
library(patchwork)
library(ggh4x)

# Load metadata as a tibble
metadata <- read_delim("path/to/your/metadata.txt", delim = "\t", escape_double = FALSE, trim_ws = TRUE)

# Load KEGG pathway abundance
kegg_abundance <- ko2kegg_abundance("path/to/your/pred_metagenome_unstrat.txt")

# Perform pathway differential abundance analysis (DAA) using ALDEx2 method
# Replace "Peatland" with your actual group column name if not using the example dataset
daa_results_df <- pathway_daa(abundance = kegg_abundance, metadata = metadata, group = "Peatland", daa_method = "LinDA", select = NULL, reference = NULL)

# Generate pathway PCA plot
# Replace "Peatland" with your actual group column name if not using the example dataset
pca_plot <- pathway_pca(abundance = kegg_abundance, metadata = metadata, group = "Peatland")

# Generate pathway heatmap
heatmap_plot <- pathway_heatmap(abundance = kegg_abundance, metadata = metadata, group = "Peatland")

Before running this code, please ensure that the sample names are consistent across your metadata and abundance files, with no extra or missing sample names in either. If you encounter any issues or need further assistance, feel free to reach out.

Best Regards, Chen YANG

Dear @cafferychen777 ,

thank you for getting back to me so quickly. The code you provided was exactly what I had been using, apart from the library(ggh4x) command. After restarting Rstudio, copying everything into a new script and adding that command it is now working. Not exactly sure why (could it really have been because of the package), but I won't complain, as long as it works...

BR, Katharina

cafferychen777 / ggpicrust2