chanzuckerberg / cellxgene-census

CZ CELLxGENE Discover Census
https://chanzuckerberg.github.io/cellxgene-census/
MIT License
78 stars 20 forks source link

Some datasets appear to be missing from census_info in API call. #1022

Closed kylekimler closed 5 months ago

kylekimler commented 6 months ago

Describe the bug

Info for many datasets aren't available in the cellxgene_census API.

To Reproduce

census_datasets <-  census$get("census_info")$get("datasets")
census_datasets <- census_datasets$read()$concat()
datasetsdf <- as.data.frame(census_datasets)

print(census_datasets$dim)

There appear to be only 651 datasets whereas there are 1237 on the portal. For example, I'm looking for an HTAN dataset that is available on the Discover portal: https://cellxgene.cziscience.com/collections/a48f5033-3438-4550-8574-cdff3263fdfd

I'd like to be able to access all of these datasets using their DOIs for meta-analysis - is this a bug?

Additional context

The same is true in the python API.

Thank you! Kyle Kimler

pablo-gar commented 6 months ago

Hi @kylekimler,

Thanks for reaching out and flagging this. There are two reasons for the discrepancy between the datasets shown in the CELLxGENE Discover portal and Census.

  1. There are certain assays not supported by Census, for example any ATAC-seq or DNA methylation technologies, or for example data that needs special metadata not included in Census like Patch-seq.
  2. There has been novel data added to CELLxGENE Discover not included in the acceptance list for Census.

The data you refer to falls within 2), and we have been actively working on updating the list of Census assays to include the latest accepted data. We are in the final stages of updating the list, so these data will shortly be included in Census.

kylekimler commented 6 months ago

Hi @pablo-gar, Thank you for the info!! So if I'm understanding correctly: there is a list of assays filtering Discover to the API and it needs to be limited to scRNA rather than other modalities. Currently there are some scRNA assays such as, for example, the latest 10x 5' v3 scRNA, that haven't been included yet? Thanks again

bkmartinjr commented 5 months ago

@kylekimler - there were assays missing. A fix is in progress and should land this month, as part of a broader schema update. See PR #1024 for details.

kylekimler commented 5 months ago

Thank you @bkmartinjr!!