KeyError on download_source_h5ad with Valid Dataset ID in cellxgene_census #1100

Describe the bug

When attempting to download a dataset using the cellxgene_census Python library, the function download_source_h5ad fails with a KeyError indicating an 'Unknown dataset_id'. The dataset ID used does exist as per the URL provided, suggesting a possible issue with the dataset ID mapping or retrieval process within the library.

To Reproduce

Steps to reproduce the behavior:

  1. Import the cellxgene_census library in Python.
  2. Attempt to download the dataset with the following code:
    import cellxgene_census
    cellxgene_census.download_source_h5ad("0500e103-38db-456d-9c3f-b96b8a693ab2", o_path="0500e103-38db-456d-9c3f-b96b8a693ab2_.h5ad")
  3. Observe the KeyError on execution.

Expected behavior

The expected behavior is that the dataset with ID 0500e103-38db-456d-9c3f-b96b8a693ab2 should download successfully without errors, saving the file to the specified output path 0500e103-38db-456d-9c3f-b96b8a693ab2_.h5ad


Provide a description of your system and the software versions.

Additional context

This issue blocks data download tasks which are critical for downstream analysis. The dataset appears to be available and accessible directly via browser, which indicates a potential issue in the library's URI handling or dataset ID validation logic.

ebezzi commented 3 months ago

Hey @ubyndr,

the dataset is not part of the Census as its assay (snmC-Seq2, ontology term EFO:0030027) is not among the accepted assays.

If you're only interested in downloading its h5ad, you can generate a download link from its collection page in the CELLxGENE Discover portal. For this dataset, you can use this link:

Let me know if you have any other question.

ubyndr commented 3 months ago

Thank you for clarifying this. I appreciate it. I don't have any further question.