chanzuckerberg / cellxgene-census

CZ CELLxGENE Discover Census
https://chanzuckerberg.github.io/cellxgene-census/
MIT License
72 stars 18 forks source link

Draft schema for `collection_doi_label` of `census["census_info"]["dataset"]` #1179

Open pablo-gar opened 1 month ago

pablo-gar commented 1 month ago

see https://czi-sci.slack.com/archives/C023Q1APASK/p1716245770988479 for reference.

Schema changes

[...]

Census table of CELLxGENE Discover datasets – census_obj["census_info"]["datasets"]SOMADataFrame

All datasets used to build the Census MUST be included in a table modeled as a SOMADataFrame. Each row MUST correspond to an individual dataset with the following columns:

Column Encoding Description
citation string As defined in the CELLxGENE Discover API data schema.
collection_id string
collection_name string
collection_doi string
collection_doi_label string
dataset_id string
dataset_title string
dataset_h5ad_path string Relative path to the source h5ad file in the Census storage bucket.
dataset_total_cell_count int Total number of cells from the dataset included in the Census.
dataset_version_id string As defined in the CELLxGENE Discover API data schema.

[...]

Changelog

Version 2.1.0

[...]

brianraymor commented 1 month ago

Nit. I would not use "crossref" in the name. It's just the mechanism used to acquire publication metadata.

The citation is an internal design based on that metadata.

How about collection_doi_labelto differentiate from the dataset citation and to self-document that it's related to collection_doi.

pablo-gar commented 1 month ago

That's a good name. Updated.

pablo-gar commented 1 month ago

Updated since the string will be directly fetched from Discover API