Cellular-Semantics / CL_KG

Building a Cell Ontology Knowledge-Base from data, and LLMs
Apache License 2.0
0 stars 0 forks source link

Review curation SoP #40

Open dosumis opened 2 weeks ago

dosumis commented 2 weeks ago

Ugur needs only:

  1. Something to identify dataset - currently h5ad link
  2. Author cell type fields

In future: author cell type field present : T/F (update SOP - there should be a row with blank author cell type field(s) To deal with version changes, need CxG Link

Editors need

Details https://github.com/Cellular-Semantics/CL_KG/blob/main/docs/dataset_curation_guidelines.md

1. DataSet identification:

We have 7 fields:

Dataset (individual datasets within larger group): Description: The specific name of the dataset being curated within a larger dataset group. Example: "Single cell transcriptional and chromatin accessibility profiling redefine cellular heterogeneity in the adult human kidney - ATACseq"

Full name dataset (top of page): Description: The full descriptive name of the dataset that should be used for documentation and display. Example: "Single cell transcriptional and chromatin accessibility profiling redefine cellular heterogeneity in the adult human kidney"

CxG Link: Description: The CellxGene link to access the dataset. Example: "https://cellxgene.cziscience.com/e/13a027de-ea3e-432b-9a5e-6bc7048498fc.cxg/"

h5ad link: Description: The direct link to the .h5ad data file of the dataset. Example: "https://datasets.cellxgene.cziscience.com/dabd979f-cc50-4526-81f3-8bc6c673ca36.h5ad"

Reference_DOI: Description: The DOI reference for the associated publication(s) for the dataset. Example: "DOI: 10.1038/s41467-021-22368-w"

Study Short Name: Description: The shortened name or acronym of the study associated with the dataset. Example: "Muto et al. (2021) Nat Commun"

CxG Dataset Collection X: Description: The CellxGene link to the collection where the dataset is stored. Example: "https://cellxgene.cziscience.com/collections/9b02383a-9358-4f0f-9795-a891ec523bcc"

Do we need them all?

Use cases:

Not needed:

h5ad should be the latest version which may not be the same as the CxG dataset link.

content

Suggestion: