cellgeni / sceasy

A package to help convert different single-cell data formats to each other
GNU General Public License v3.0
374 stars 54 forks source link

[import of anndata] Error in `match(x, table, nomatch = 0L)`: 'match' requires vector arguments #61

Closed siberianisaev closed 2 years ago

siberianisaev commented 2 years ago

Hello, I found the case where Sceasy is NOT working (test h5ad file is not important, please use any one):

library(tools)
library(here)

library(reticulate)
use_virtualenv(here::here(".venv"))

library(Seurat)
library(sceasy)
library(anndata)

data_folder <- here::here("tests/data")
anndata_file_name = "plateletOutput.h5ad"

test_that("AnnData to Seurat conversion", {
  anndata_path <- paste(data_folder, "/", anndata_file_name, sep = "")
  seurat_path <- paste(data_folder, "/", file_path_sans_ext(anndata_file_name), ".rds", sep = "")

  sceasy::convertFormat(anndata_path, from = "anndata", to = "seurat", outFile = seurat_path)

  seurat_object <- readRDS(seurat_path)
  expect_false(is.null(seurat_object))

  # library(anndata)
  ad_object <- read_h5ad(anndata_path, backed="r")
  expect_false(is.null(ad_object))
})

Output:

> devtools::test()
ℹ Testing scripts
✔ | F W S  OK | Context
⠏ |         0 | converter
Attaching SeuratObject
Attaching sp
/Users/andrey_isaev/Documents/GitHub/…/scripts/.venv/lib/python3.9/site-packages/anndata/_core/anndata.py:121: ImplicitModificationWarning: Transforming to str index.
  warnings.warn("Transforming to str index.", ImplicitModificationWarning)
X -> counts
✖ | 1       0 | converter [11.3s]                                                                                     
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Error (test-converter.R:18:3): AnnData to Seurat conversion
Error in `match(x, table, nomatch = 0L)`: 'match' requires vector arguments
Backtrace:
 1. sceasy::convertFormat(...)
      at test-converter.R:18:2
 2. sceasy (local) func(obj, outFile = outFile, main_layer = main_layer, ...)
 3. base::sapply(...)
 4. base::lapply(X = X, FUN = FUN, ...)
 5. sceasy (local) FUN(X[[i]], ...)
 8. anndata:::`[.collections.abc.Mapping`(ad$obsm, x)
 9. name %in% x$keys()

══ Results 
Duration: 11.5 s

[ FAIL 1 | WARN 0 | SKIP 0 | PASS 0 ]

Similar code. The case where Sceasy is working:

library(tools)
library(here)

library(reticulate)
use_virtualenv(here::here(".venv"))

library(Seurat)
library(sceasy)
# library(anndata)

data_folder <- here::here("tests/data")
anndata_file_name = "plateletOutput.h5ad"

test_that("AnnData to Seurat conversion", {
  anndata_path <- paste(data_folder, "/", anndata_file_name, sep = "")
  seurat_path <- paste(data_folder, "/", file_path_sans_ext(anndata_file_name), ".rds", sep = "")

  sceasy::convertFormat(anndata_path, from = "anndata", to = "seurat", outFile = seurat_path)

  seurat_object <- readRDS(seurat_path)
  expect_false(is.null(seurat_object))

  library(anndata)
  ad_object <- read_h5ad(anndata_path, backed="r")
  expect_false(is.null(ad_object))
})

Output

> devtools::test()
ℹ Testing scripts
✔ | F W S  OK | Context
⠏ |         0 | converter
Attaching SeuratObject
Attaching sp
/Users/andrey_isaev/Documents/GitHub/…/scripts/.venv/lib/python3.9/site-packages/anndata/_core/anndata.py:121: ImplicitModificationWarning: Transforming to str index.
  warnings.warn("Transforming to str index.", ImplicitModificationWarning)
X -> counts
⠋ |         1 | converter                                                                                             /Users/andrey_isaev/Documents/GitHub/.../scripts/.venv/lib/python3.9/site-packages/anndata/_core/anndata.py:121: ImplicitModificationWarning: Transforming to str index.
  warnings.warn("Transforming to str index.", ImplicitModificationWarning)
✔ |         2 | converter [18.2s]                                                                                     

══ Results 
Duration: 18.3 s

[ FAIL 0 | WARN 0 | SKIP 0 | PASS 2 ]

Difference:

diff

Could you please take a look?

siberianisaev commented 2 years ago

Found solution for it:

library(tools)
library(here)

library(reticulate)
use_virtualenv(here::here(".venv"))

library(Seurat)
library(sceasy)
ad <- reticulate::import("anndata", convert = FALSE) # ADDED

data_folder <- here::here("tests/data")
anndata_file_name = "plateletOutput.h5ad"

test_that("AnnData to Seurat conversion", {
  anndata_path <- paste(data_folder, "/", anndata_file_name, sep = "")
  seurat_path <- paste(data_folder, "/", file_path_sans_ext(anndata_file_name), ".rds", sep = "")

  sceasy::convertFormat(anndata_path, from = "anndata", to = "seurat", outFile = seurat_path)

  seurat_object <- readRDS(seurat_path)
  expect_false(is.null(seurat_object))

  ad_object <- ad$read_h5ad(anndata_path, backed="r") # UPDATED
  expect_false(is.null(ad_object))
})
evanbiederstedt commented 2 years ago

This is interesting

I didn't know that the creators of ScanPy made an R package called anndata: https://cran.r-project.org/web/packages/anndata/index.html

I think that's the source of the confusion here ---- I hadn't realized the ScanPy authors created this: https://github.com/dynverse/anndata

The python package must be loaded via reticulate:

anndata <- reticulate::import('anndata')

and library(anndata) will not work for conversions in the sceasy workflow

It's possible we should add a clarifying remark in the README for new users trying to figure out this whole Python/R mess in single-cell genomics.

CC @nh3 @wikiselev @mckinsel