pathway_heatmap can't extract columns past the end

luigallucci commented 2 days ago

Describe the Bug Hi, I'm trying to make the heatmap directly from picrust2 file. I tried to modify the sample column to sample_name or other modification, but nothing worked. Error in pull(): ! Can't extract columns past the end. ℹ Location 1 doesn't exist. ℹ There are only 0 columns. Reproducible Example

annotated_kegg <- pathway_annotation(file = abundance_file, pathway = "KO", ko_to_kegg = TRUE)

heat <- pathway_heatmap(annotated_kegg, metadata, "Type")

Environment Information:

Operating System: MAC OS - osx-arm64
R Version: 4.4.0
Package Version: latest

cafferychen777 commented 2 days ago

Dear l.gallucci,

Thank you for reporting this issue with the pathway_heatmap function in the ggpicrust2 package. To better assist you, I'll need some additional information:

Could you please share the first few lines of your abundance_file and metadata file? This will help me understand the structure of your data.
What are the dimensions (number of rows and columns) of your annotated_kegg and metadata dataframes?
Can you provide the full error message and traceback you're receiving?
To facilitate debugging, it would be extremely helpful if you could send your abundance_file and metadata file to cafferychen777@tamu.edu. Please ensure to remove any sensitive information before sharing.
Could you also share the output of sessionInfo() to provide more details about your R environment?

Once I have this information, I'll be able to reproduce the issue and work on a solution more effectively.

Thank you for your patience and cooperation in resolving this issue.

Best regards, Chen Yang

luigallucci commented 2 days ago

sure.

<error/vctrs_error_subscript_oob>
Error in `pull()`:
! Can't extract columns past the end.
ℹ Location 1 doesn't exist.
ℹ There are only 0 columns.
---
Backtrace:
     ▆
  1. ├─ggpicrust2::pathway_heatmap(annotated_kegg, metadata, "Type")
  2. │ └─metadata %>% select(all_of(c(sample_name_col))) %>% pull()
  3. ├─dplyr::pull(.)
  4. ├─dplyr:::pull.data.frame(.)
  5. │ └─tidyselect::vars_pull(names(.data), !!enquo(var))
  6. │   └─tidyselect:::pull_as_location2(...)
  7. │     ├─tidyselect:::with_subscript_errors(...)
  8. │     │ └─base::withCallingHandlers(...)
  9. │     └─vctrs::num_as_location2(...)
 10. │       ├─vctrs:::result_get(...)
 11. │       └─vctrs:::vec_as_location2_result(...)
 12. │         ├─base::tryCatch(...)
 13. │         │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
 14. │         │   └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
 15. │         │     └─base (local) doTryCatch(return(expr), name, parentenv, handler)
 16. │         └─vctrs::vec_as_location(i, n, names = names, arg = arg, call = call)
 17. └─vctrs (local) `<fn>`()
 18.   └─vctrs:::stop_subscript_oob(...)
 19.     └─vctrs:::stop_subscript(...)
 20.       └─rlang::abort(...)

2,173 entries, 41 total columns for annotated kegg 39 entries, 19 columns metadata

sessionInfo:

R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.4

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ALDEx2_1.28.0         zCompositions_1.5.0-4 truncnorm_1.0-9       NADA_1.6-1.1          survival_3.7-0       
 [6] MASS_7.3-60.0.1       patchwork_1.3.0       ggprism_1.0.5         lubridate_1.9.3       forcats_1.0.0        
[11] stringr_1.5.1         dplyr_1.1.4           purrr_1.0.2           tidyr_1.3.1           tidyverse_2.0.0      
[16] tibble_3.2.1          readr_2.1.5           ggpicrust2_1.7.3      ggthemes_5.1.0        ggplot2_3.5.1        

loaded via a namespace (and not attached):
  [1] splines_4.3.2               later_1.3.2                 bitops_1.0-8                lifecycle_1.0.4            
  [5] edgeR_4.0.16                doParallel_1.0.17           vroom_1.6.5                 lattice_0.22-6             
  [9] magrittr_2.0.3              limma_3.58.1                remotes_2.5.0               httpuv_1.6.15              
 [13] Wrench_1.20.0               sessioninfo_1.2.2           pkgbuild_1.4.4              metagenomeSeq_1.43.0       
 [17] DBI_1.2.3                   RColorBrewer_1.1-3          ade4_1.7-22                 multcomp_1.4-26            
 [21] abind_1.4-8                 pkgload_1.4.0               zlibbioc_1.48.2             quadprog_1.5-8             
 [25] GenomicRanges_1.54.1        BiocGenerics_0.48.1         RCurl_1.98-1.16             TH.data_1.1-2              
 [29] phyloseq_1.48.0             sandwich_3.1-1              circlize_0.4.16             GenomeInfoDbData_1.2.11    
 [33] IRanges_2.36.0              S4Vectors_0.40.2            vegan_2.6-8                 permute_0.9-7              
 [37] codetools_0.2-20            getopt_1.20.4               coin_1.4-3                  DelayedArray_0.28.0        
 [41] tidyselect_1.2.1            shape_1.4.6.1               farver_2.1.2                matrixStats_1.4.1          
 [45] stats4_4.3.2                jsonlite_1.8.8              GetoptLong_1.0.5            multtest_2.58.0            
 [49] ellipsis_0.3.2              iterators_1.0.14            foreach_1.5.2               tools_4.3.2                
 [53] Rcpp_1.0.13                 glue_1.7.0                  SparseArray_1.2.4           DESeq2_1.42.1              
 [57] mgcv_1.9-1                  MatrixGenerics_1.14.0       usethis_3.0.0               GenomeInfoDb_1.38.8        
 [61] withr_3.0.1                 BiocManager_1.30.25         fastmap_1.2.0               GGally_2.2.1               
 [65] latticeExtra_0.6-30         rhdf5filters_1.14.1         fansi_1.0.6                 Maaslin2_1.16.0            
 [69] caTools_1.18.3              digest_0.6.37               timechange_0.3.0            R6_2.5.1                   
 [73] mime_0.12                   colorspace_2.1-1            gtools_3.9.5                jpeg_0.1-10                
 [77] utf8_1.2.4                  generics_0.1.3              data.table_1.16.0           robustbase_0.99-4          
 [81] httr_1.4.7                  htmlwidgets_1.6.4           S4Arrays_1.2.1              ggstats_0.6.0              
 [85] pkgconfig_2.0.3             gtable_0.3.5                modeltools_0.2-23           ComplexHeatmap_2.18.0      
 [89] XVector_0.42.0              pcaPP_2.0-5                 htmltools_0.5.8.1           profvis_0.3.8              
 [93] biomformat_1.30.0           clue_0.3-65                 scales_1.3.0                Biobase_2.62.0             
 [97] png_0.1-8                   optparse_1.7.5              rstudioapi_0.16.0           tzdb_0.4.0                 
[101] reshape2_1.4.4              rjson_0.2.23                curl_5.2.2                  nlme_3.1-166               
[105] zoo_1.8-12                  cachem_1.1.0                rhdf5_2.46.1                GlobalOptions_0.1.2        
[109] KernSmooth_2.23-24          parallel_4.3.2              miniUI_0.1.1.1              libcoin_1.0-10             
[113] RcppZiggurat_0.1.6          pillar_1.9.0                grid_4.3.2                  vctrs_0.6.5                
[117] gplots_3.1.3.1              urlchecker_1.0.1            promises_1.3.0              xtable_1.8-4               
[121] cluster_2.1.6               mvtnorm_1.3-1               cli_3.6.3                   locfit_1.5-9.10            
[125] compiler_4.3.2              rlang_1.1.4                 crayon_1.5.3                lefser_1.12.1              
[129] labeling_0.4.3              interp_1.1-6                plyr_1.8.9                  fs_1.6.4                   
[133] stringi_1.8.4               deldir_2.0-4                BiocParallel_1.36.0         munsell_0.5.1              
[137] Biostrings_2.70.3           devtools_2.4.5              glmnet_4.1-8                Matrix_1.6-5               
[141] hms_1.1.3                   bit64_4.0.5                 Rhdf5lib_1.24.2             KEGGREST_1.42.0            
[145] statmod_1.5.0               shiny_1.9.1                 SummarizedExperiment_1.32.0 Rfast_2.1.0                
[149] igraph_2.0.3                memoise_2.0.1               RcppParallel_5.1.9          biglm_0.9-3                
[153] bit_4.0.5                   DEoptimR_1.1-3              directlabels_2024.1.21      ape_5.8

cafferychen777 commented 2 days ago

Dear l.gallucci,

Thank you for reporting this issue with the pathway_heatmap function in the ggpicrust2 package. I believe I understand the problem now:

The column names in your abundance_file don't match the sample IDs in your metadata file. Specifically:

Your metadata file has sample IDs like "sample_id", "Ex2", "Ex4", "Ex_6", "Ex_7", etc.
Your abundance_file has column names like "1", "10", "11", "12", "13", "15", "16", "17", etc.

This mismatch is likely causing the error you're seeing. To resolve this, you need to modify the column names in your abundance_file to match the sample IDs in your metadata file.

Here's a suggested solution:

First, check your metadata file to confirm the exact sample IDs.
Then, modify your abundance_file column names to match these sample IDs.

You can do this using the colnames() function in R. Here's an example of how you might do this:

# Assuming your abundance_file is loaded into a dataframe called 'abundance_df'
# and your metadata is loaded into a dataframe called 'metadata_df'

# Get the sample IDs from your metadata
sample_ids <- metadata_df$sample_id  # or whatever column contains your sample IDs

# Make sure the number of samples matches
if(length(sample_ids) == ncol(abundance_df) - 1) {  # -1 because the first column is likely feature IDs
  # Set the column names of abundance_df
  colnames(abundance_df)[-1] <- sample_ids
} else {
  stop("The number of samples in metadata doesn't match the number of columns in abundance file")
}

After making this change, try running your original code again:

annotated_kegg <- pathway_annotation(file = abundance_df, pathway = "KO", ko_to_kegg = TRUE)
heat <- pathway_heatmap(annotated_kegg, metadata_df, "Type")

If you're still encountering issues after making these changes, please let me know and provide:

The first few lines of your abundance_file and metadata file (after making the changes).
The dimensions of your annotated_kegg and metadata_df dataframes.
Any error messages you're still seeing.

This should help resolve the "Can't extract columns past the end" error you were experiencing. Let me know if you need any further assistance!

Best regards, Chen Yang

luigallucci commented 2 days ago

Dear @cafferychen777 , thank you for the reply.

This is what I performed. Sorry I forgot to specify that I'm using dada_id as names for sampleID.

Unlikely, even changing this the result is still the same.

Apparently, the problems seems to be related to this:

metadata %>% select(all_of(c(sample_name_col))) %>% pull()

cafferychen777 commented 2 days ago

Hi @luigallucci ,

Could you sent the data file to cafferychen777@tamu.edu?

Best,

cafferychen777 / ggpicrust2

pathway_heatmap can't extract columns past the end #118