chanzuckerberg / cellxgene-census

CZ CELLxGENE Discover Census
https://chanzuckerberg.github.io/cellxgene-census/
MIT License
84 stars 22 forks source link

feature_dataset_presence_matrix should include genes with 0 counts #1284

Open ebezzi opened 1 month ago

ebezzi commented 1 month ago

Describe the bug

The current feature_dataset_presence_matrix only reports genes that have a total expression count > 0. Instead, it should report all genes that were listed in the original dataset.

To Reproduce

For dataset 0895c838-e550-48a3-a777-dbcd35d30272, 13696 genes are nnz

Expected behavior

For dataset 0895c838-e550-48a3-a777-dbcd35d30272, 33363 should be nnz

ivirshup commented 1 month ago

How big of a bug is this? Does it need to be fixed in the LTS?

ebezzi commented 1 month ago

Don't think it's worth an errata.

giovp commented 1 week ago

The current feature_dataset_presence_matrix only reports genes that have a total expression count > 0. Instead, it should report all genes that were listed in the original dataset.

I'm not sure if this is the correct issue that we were discussing @ivirshup , but I wonder if both information should be present

on the latter, is there a check during submission, that checks that all genes submitted have at least one count in any of the cells in the dataset? just wondering