Closed sunhuaiyu closed 3 months ago
Thanks for filing this ticket @sunhuaiyu. This week we rolled out a change encode certain Census cell metadata as categorical, you are observing the effects of that change.
You can find out more details here.
https://chanzuckerberg.github.io/cellxgene-census/articles/2024/20240404-categoricals.html
Describe the bug
Data type of some columns in "obs" used to be "large-string" but now are "dictionary".
To Reproduce
Expected behavior
pyarrow.Table soma_joinid: int64 dataset_id: dictionary
assay: dictionary
assay_ontology_term_id: dictionary
cell_type: dictionary
cell_type_ontology_term_id: dictionary
development_stage: dictionary
development_stage_ontology_term_id: dictionary
disease: dictionary
disease_ontology_term_id: dictionary
donor_id: dictionary
is_primary_data: bool
observation_joinid: large_string
self_reported_ethnicity: dictionary
self_reported_ethnicity_ontology_term_id: dictionary
sex: dictionary
sex_ontology_term_id: dictionary
suspension_type: dictionary
tissue: dictionary
tissue_ontology_term_id: dictionary
tissue_type: dictionary
tissue_general: dictionary
tissue_general_ontology_term_id: dictionary
raw_sum: double
nnz: int64
raw_mean_nnz: double
raw_variance_nnz: double
n_measured_vars: int64
Environment
ubuntu20.04 python=3.11 cellxgene-census==1.12.0 tiledbsoma==1.9.3 pyarrow==15.0.2
Additional context
This should have happened in the past month. The last "latest" version on 2024-02-21 didn't have such change.