chanzuckerberg / single-cell-curation

Code and documentation for the curation of cellxgene datasets
MIT License
37 stars 23 forks source link

cellxgene-schema CLI must validate ndarray size in obsm, obsp, varm, and varp #610

Closed brianraymor closed 1 year ago

brianraymor commented 1 year ago

obsm (Embeddings)

The size of the ndarray stored for a key in obsm MUST NOT be zero.

obsp

The size of the ndarray stored for a key in obsp MUST NOT be zero.

varm

The size of the ndarray stored for a key in varm MUST NOT be zero.

varp

The size of the ndarray stored for a key in varp MUST NOT be zero.

Bento007 commented 1 year ago

I assume "a key" means any key?

brianraymor commented 1 year ago

I assume "a key" means any key?

Yes since this causes RDS conversion failures - https://github.com/chanzuckerberg/single-cell-curation/issues/597#issue-1842192464

Bento007 commented 1 year ago

Do we have an existing test case dataset for any of these?

brianraymor commented 1 year ago

Not to my knowledge. When I was playing around with it, I realized that it also needed to meet other anndata requirements like obsm embeddings must be the same length as obs like:

adata.obsm['test'] = numpy.zeros((adata.n_obs, 0))
adata.obsm['test'].size 
Bento007 commented 1 year ago

@jahilton thank you for these notebooks. They have been invaluable in reproducing the errors.

Bento007 commented 1 year ago

@jahilton @corismall read for QA

jahilton commented 1 year ago

LGTM - QA notebook