Open meghanaturner opened 1 year ago
Thanks for reporting this error!
Starting with the latest scrattch-mapping docker (bicore/scrattch_mapping:latest). I wasn't able to recreate this issue from our test cases, so you've found some fun edge case. To be complete, I also tried loading AIT17.0.logCPM.sampled100_MERSCOPE_BRAIN3_GENES_dense_for_mapping.h5ad
directly with anndata::read_h5ad()
and found the column names available for AIT.anndata$uns$clusterInfo
.
I'll need some additional info to figure out what's going on. A few questions:
anndata::read_h5ad()
or scrattch_mapping::loadTaxonomy()
to load the taxonomy into R?scrattch_mapping::buildTaxonomy()
?FYI @scseeman
anndata::read_h5ad()
to load the taxonomyscrattch_mapping::buildTaxonomy()
/allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/meghanturner/brain3_mapping/scrattch-mapping_batch.R
Error in taxonomy_mapping(AIT.anndata = taxonomy_anndata, query.data = query_data, : Not all label.cols exists in AIT.anndata$uns$clusterInfo
That's interesting that you can find the column names when you load with anndata::read_h5ad()
in the latest. For me, there's different behavior in how the column names are accessible between the two versions.
In the latest version:
colnames(anndata$uns$clusterInfo)
, which is called in line 22 of taxonomy_mapping, returns NULL
anndata$uns$clusterInfo$columns
returns the expected column names:
[1] "sample_id" "cl" "cluster_label"
[4] "Level2_id_label" "Level1_id_label" "supertype_id_label" [7] "class_id_label" "nt_type_label" "cluster_id.AIT16"
[10] "library_prep" "gene.counts.0" "doublet_score"
[13] "roi" "umi.counts" "qc.score"
[16] "method" "region_label" "region_id"
[19] "sex" "external_donor_name" "age"
[22] "platform" "knn.dist" "knn.dist.z"
[25] "medical_conditions" "broad_region" "cluster_id"
[28] "neighborhood" "batch"
class(taxonomy_anndata$uns$clusterInfo)
returns:
[1] "pandas.core.frame.DataFrame" "pandas.core.generic.NDFrame"
[3] "pandas.core.base.PandasObject" "pandas.core.accessor.DirNamesMixin" [5] "pandas.core.indexing.IndexingMixin" "pandas.core.arraylike.OpsMixin"
[7] "python.builtin.object"
In 0.16 the opposite is true:
anndata$uns$clusterInfo$columns
returns NULL
colnames(anndata$uns$clusterInfo)
, which is called in line 22 of taxonomy_mapping, returns the expected column names:
[1] "sample_id" "cl" "cluster_label"
[4] "Level2_id_label" "Level1_id_label" "supertype_id_label" [7] "class_id_label" "nt_type_label" "cluster_id.AIT16"
[10] "library_prep" "gene.counts.0" "doublet_score"
[13] "roi" "umi.counts" "qc.score"
[16] "method" "region_label" "region_id"
[19] "sex" "external_donor_name" "age"
[22] "platform" "knn.dist" "knn.dist.z"
[25] "medical_conditions" "broad_region" "cluster_id"
[28] "neighborhood" "batch"
class(taxonomy_anndata$uns$clusterInfo)
returns:
[1] "data.frame"
I was able to consistently retrieve anndata$uns$clusterInfo as a data.frame
under both scrattch_mapping versions. This has to be an environment issue thats leading to you seeing pandas.core.frame.DataFrame under the latest docker. One last ask: Can you return the sessionInfo() for each scrattch_mapping docker when you are running it.
Somehow the version of anndata (R library) was downgraded in the latest scrattch mapping docker. I suspect this is the culprit:
bicore/scrattch_mapping:latest -- anndata_0.7.5.3 bicore/scrattch_mapping:0.16 -- anndata_0.7.5.6
@meghanaturner @UCDNJJ a couple of weeks ago I was having issues with :latest
docker loading at all. I talked with Anish about it and realize that I never actually heard if it got fixed. I've been using the singularity file /allen/programs/celltypes/workgroups/rnaseqanalysis/bicore/singularity/scrattch_mapping_0.2.sif
directly instead of the docker and that has worked fine
Indeed, 0.16 has anndata="0.7.5.6" and latest has anndata="0.7.5.3"
It looks like R was also downgraded:
0.16 sessionInfo():
R version 4.2.2 (2022-10-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 22.04.2 LTS
Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=Cattached base packages: [1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached): [1] compiler_4.2.2
latest sessionInfo():
R version 4.2.0 (2022-04-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.6 LTS
Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=Cattached base packages: [1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached): [1] compiler_4.2.0
Hi @meghanaturner, when you have time can you check that this issues is resolved when using this docker image: docker://njjai/scrattch_mapping:0.4
. This docker image has the most up to date versions of the anndata and R packages that the previous latest
image was supposed to contain.
singularity shell --cleanenv docker://njjai/scrattch_mapping:0.4
Forewarning, quite a few changes exist in this new update. So if you hit an error let us know.
Hi @UCDNJJ, this docker image seems to have fixed the original issue I reported where the column names weren't found
@UCDNJJ However, read_h5ad()
no longer reads in sparse matrices as the data type that scrattch-mapping
is expecting to find. This problem spontaneously showed up in docker://bicore/scrattch_mapping:0.16
and docker://bicore/scrattch_mapping:latest
a couple weeks ago, and does not appear to be fixed by docker://njjai/scrattch_mapping:0.4
.
"Error caught for Correlation mapping." <simpleError in validObject(.Object): invalid class “dgCMatrix” object: 'Dim' slot does not have length 2> Error in
rownames<-
(*tmp*
, value = colnames(query.data)) : attempt to set 'rownames' on an object with no dimensions Calls: taxonomy_mapping -> rownames<-
The same error is thrown for a dgCMatrix. The workaround is to only use taxonomy and spatial anndata objects where X is a dense matrix.
As an alternative to read_h5ad, I tried using
loadTaxonomy(taxonomyDir = "//allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/meghanturner/brain3_mapping/AIT17.0.logCPM.sampled100_MERSCOPE_BRAIN3_GENES_cscSparseX.h5ad", anndata_file = "AIT17.0.logCPM.sampled100_MERSCOPE_BRAIN3_GENES_cscSparseX.h5ad")
but despite the documentation for the taxonomyDir argument suggesting that it supports direct h5ad files that aren't part of a shiny taxonomy folder, it errors out with: Required files to load Allen Institute taxonomy are missing.
I saw that you split off scrattch-taxonomy, including loadTaxonomy(), from scrattch-mapping into it's own repo. Should I raise this issue over there?
Interesting, we definitely don't want to be using dense matrices all the time! Let's leave this issue here for now.
We need to do a better job with documentation but you should always use loadTaxonomy()
since we do some work in that function to make sure the anndata object is initialized for mapping. The anndata_file argument assumes an .h5ad file that was generated with buildTaxonomy()
which is why you are seeing that error about missing required files.
I took a quick look and AIT17.0.logCPM.sampled100_MERSCOPE_BRAIN3_GENES_cscSparseX.h5ad
doesn't appear to have been setup with buildTaxonomy()
so this .h5ad will not work with scrattch.mapping. I would suggest running buildTaxonomy()
using the count matrix and metadata from that object, you can follow the steps in this tutorial: build_taxonomy
Also, can see if you can run the tutorial without error: mapping
In attempting to follow the build_taxonomy tutorial, I am unable to load the counts matrix from the taxonomy I'm using into R.
I am not familiar with R, so I'm not sure what R's anndata package is expecting to find in an ad.X stored as a CSR sparse matrix. And the tutorial does not provide any suggestions of how to read in counts matrices from other h5ad files (it just does library(tasic2016data); taxonomy.counts = tasic_2016_counts
)
# import libraries
library(scrattch.mapping)
library(umap)
# taxonomy I want to use for mapping my spatial data
taxonomy_h5ad_path = "//allen/programs/celltypes/workgroups/rnaseqanalysis/shiny/Taxonomies/AIT17.0_mouse/Prepare/AIT17.0.logCPM.sampled100.h5ad"
# Load taxonomy anndata file
taxonomy_anndata = read_h5ad(taxonomy_h5ad_path )
# Load the count data
taxonomy.counts = taxonomy_anndata$X # ***this line fails***
Error:
Error in py_ref_to_r(x) : negative length vectors are not allowed
Calls: <Anonymous> ... py_to_r.numpy.ndarray -> NextMethod -> py_to_r.default -> py_ref_to_r
So I also tried running through your code both in a separate R environment and within the scrattch.mapping docker. Both produced the same error.
Could this error be arising due to the dataset size or some change in the .h5ad file that happened a few weeks ago.
I can use the same approach you shared with a dataset or ~340k cells and ~22k genes: /allen/programs/celltypes/workgroups/rnaseqanalysis/shiny/10x_seq/NHP_BG_AIT_115/NHP_BG_AIT115_complete.h5ad. R successfully reads in the anndata$X as a dgR sparse matrix.
Using the latest scrattch-mapping docker image release leads to an error on line 22 of
R/taxonomy_mapping()
becausecolnames(AIT.anndata$uns$clusterInfo)
returns NULL.This issue can be fixed by switching back to the 0.16 version of the docker image. Using 0.16 and the exact same taxonomy h5ad file (//allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/meghanturner/brain3_mapping/taxonomies/AIT17.0.logCPM.sampled100_MERSCOPE_BRAIN3_GENES_dense_for_mapping.h5ad),
colnames(AIT.anndata$uns$clusterInfo)
returns the expected list of column names and the code runs as it should.@berl @egelfan2