drieslab / Giotto

Spatial omics analysis toolbox
https://drieslab.github.io/Giotto_website/
Other
240 stars 94 forks source link

createGiottoXeniumObject `path_list` not found error #888

Open hchoiHiLung opened 5 months ago

hchoiHiLung commented 5 months ago

Describe the Error

Whenever data_to_use is set to "aggregate", createGiottoXeniumObject returns the following error. The breast cancer dataset in the vignette doesn't work either. "subcellular" works without problems.

(a side note: data_to_use does not accept "all" as specified in the documentation.)

Error Message

```r > aggregate = createGiottoXeniumObject(xenium_dir = "outs", + data_to_use = 'aggregate', + h5_expression = F, + instructions = instrs, + cores = NA) # set number of cores to use A structured Xenium directory will be used Checking directory contents... > analysis info found └──analysis └──analysis.zarr.zip └──analysis_summary.html > boundary info found └──cell_boundaries.csv.gz └──cell_boundaries.parquet └──nucleus_boundaries.csv.gz └──nucleus_boundaries.parquet > cell feature matrix found └──cell_feature_matrix └──cell_feature_matrix.h5 └──cell_feature_matrix.zarr.zip > cell metadata found └──cells.csv.gz └──cells.parquet └──cells.zarr.zip > image info found └──morphology.ome.tif └──morphology_focus.ome.tif └──morphology_mip.ome.tif > panel metadata found └──gene_panel.json > raw transcript info found └──transcripts.csv.gz └──transcripts.parquet └──transcripts.zarr.zip > experiment info (.xenium) found └──experiment.xenium Error in .read_xenium_folder(xenium_dir = xenium_dir, data_to_use = data_to_use, : object 'path_list' not found ```

System Information

DomenicoSkyWalker89 commented 2 weeks ago

Dear all,

Thanks in advance for the support.

I got almost the same error attempting to load the lymph node dataset (xenium 10x; https://www.10xgenomics.com/datasets/preview-data-xenium-prime-gene-expression):

A structured Xenium directory will be used Checking directory contents...

analysis info found └──analysis.tar.gz └──analysis.zarr.zip └──analysis_summary.html boundary info found └──cell_boundaries.csv.gz └──cell_boundaries.parquet └──nucleus_boundaries.csv.gz └──nucleus_boundaries.parquet cell feature matrix found └──cell_feature_matrix └──cell_feature_matrix.h5 └──cell_feature_matrix.zarr.zip cell metadata found └──cells.csv.gz └──cells.parquet └──cells.zarr.zip image info found └──morphology.ome.tif panel metadata found └──gene_panel.json raw transcript info found └──transcripts.parquet └──transcripts.zarr.zip experiment info (.xenium) found └──experiment.xenium Directory check done Loading feature metadata... Loading transcript level info... Error in path_list$tx_path[[1]] : subscript out of bounds

Do you have any suggestions to overcome this error?

Best,

Domenico

sessionInfo() R version 4.2.2 (2022-10-31 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 22631)

Matrix products: default

locale: [1] LC_COLLATE=Italian_Italy.utf8 LC_CTYPE=Italian_Italy.utf8 LC_MONETARY=Italian_Italy.utf8 LC_NUMERIC=C LC_TIME=Italian_Italy.utf8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] future_1.33.2 reticulate_1.35.0 Giotto_4.0.5 GiottoClass_0.2.3

loaded via a namespace (and not attached): [1] Rcpp_1.0.11 locfit_1.5-9.8 lattice_0.20-45 listenv_0.9.1 png_0.1-8 gtools_3.9.5 digest_0.6.35 SingleCellExperiment_1.20.1 [9] utf8_1.2.4 parallelly_1.37.1 R6_2.5.1 GenomeInfoDb_1.34.9 backports_1.4.1 stats4_4.2.2 ggplot2_3.5.0 pillar_1.9.0
[17] sparseMatrixStats_1.10.0 GiottoVisuals_0.1.6 zlibbioc_1.44.0 rlang_1.1.2 rstudioapi_0.16.0 data.table_1.15.4 magick_2.8.3 S4Vectors_0.36.2
[25] R.utils_2.12.3 R.oo_1.26.0 Matrix_1.6-5 checkmate_2.3.1 BiocParallel_1.32.6 RCurl_1.98-1.14 munsell_0.5.1 beachmat_2.14.2
[33] DelayedArray_0.24.0 HDF5Array_1.26.0 compiler_4.2.2 DropletUtils_1.18.1 pkgconfig_2.0.3 BiocGenerics_0.44.0 globals_0.16.3 tidyselect_1.2.1
[41] SummarizedExperiment_1.28.0 tibble_3.2.1 GenomeInfoDbData_1.2.9 edgeR_3.40.2 IRanges_2.32.0 codetools_0.2-18 matrixStats_1.1.0 fansi_1.0.6
[49] withr_3.0.0 dplyr_1.1.4 bitops_1.0-7 rhdf5filters_1.10.1 R.methodsS3_1.8.2 grid_4.2.2 jsonlite_1.8.8 gtable_0.3.5
[57] lifecycle_1.0.4 magrittr_2.0.3 scales_1.3.0 dqrng_0.3.2 cli_3.6.2 scuttle_1.8.4 XVector_0.38.0 SpatialExperiment_1.8.1
[65] limma_3.54.2 generics_0.1.3 DelayedMatrixStats_1.20.0 vctrs_0.6.5 colorRamp2_0.1.0 Rhdf5lib_1.20.0 rjson_0.2.21 tools_4.2.2
[73] Biobase_2.58.0 glue_1.6.2 purrr_1.0.2 GiottoUtils_0.1.6 MatrixGenerics_1.10.0 parallel_4.2.2 colorspace_2.1-0 rhdf5_2.42.1

DomenicoSkyWalker89 commented 1 week ago

For all the users who encounter this error, I have found the problem. This was my mistake and not related to Giotto suite. The file 'transcripts.csv.gz' was missing from the 10x folder, and only the 'transcripts.parquet' file was present, which caused the error.

However, you can create this file using the following code:

###################################################

1. Optional convert transcripts.parquet to .cvs

###################################################

Import R package

library(arrow)

Path to your parquet file, edit path to where parquet file saved

PATH <- 'add your path'

Edit path and output name for new file

OUTPUT <- gsub('\.parquet$', '.csv', PATH)

Specify chunk size

CHUNK_SIZE <- 1e6

Read in the parquet file

parquet_file <- arrow::read_parquet(PATH, as_data_frame = FALSE) start <- 0

Optional: convert parquet data frame to CSV

while(start < parquet_file$num_rows) { end <- min(start + CHUNK_SIZE, parquet_file$num_rows) chunk <- as.data.frame(parquet_file$Slice(start, end - start)) data.table::fwrite(chunk, OUTPUT, append = start != 0) start <- end }

if(require('R.utils', quietly = TRUE)) { R.utils::gzip(OUTPUT) } ################################################################ ###############################################################

Best,

Domenico