Cellular-Semantics / CL_KG

Building a Cell Ontology Knowledge-Base from data, and LLMs
Apache License 2.0
0 stars 0 forks source link

Curate table of dataset:author fields for loading KB - for brain datasets #5

Closed dosumis closed 1 month ago

dosumis commented 6 months ago

Spec

TSV file.

Columns:

The order of fields for a single dataset determines which author cell type field becomes the primary label in cases where >1 annotation defines the same cell set. fields should be in order of author intended specificity from specific to general. Where full name fields and abbreviation fields are at the same level, fullname should come first.

ubyndr commented 6 months ago

In the lung dataset, https://cellxgene.cziscience.com/e/066943a2-fdac-4b29-b348-40cede398e4e.cxg/, I've used following order for author_cell_type;

Kidney dataset's, https://cellxgene.cziscience.com/e/0b75c598-0893-4216-afe8-5414cab7739d.cxg/, author_cell_type;

dosumis commented 6 months ago

Previous iteration recorded type/state abbv/fullname as follows. Might be best to stick with this.

Content Value Type(s)
cell types full names, abbreviations
cell types, cell states full names, abbreviations
cell types abbreviations

https://docs.google.com/spreadsheets/d/15XFXC7G80wBU2m2lrqahYFJmgriYvJ1Iy79f4JbJonU/edit#gid=0

TO re-use this we would need to map to datasets (not just collections)

AvolaAmg commented 6 months ago

Hi @dosumis thanks for adding the google sheet . This google sheet links to another one under the column of Bionetwork reference, I was wondering whether we need those extra informations. happy to had it anyways. I just wanted to double check.

AvolaAmg commented 6 months ago

Work in Progress The first page of this spreadsheet has the same structure of the sample data, the second page is an extended version with some more columns

dosumis commented 1 month ago

Done - final version: https://docs.google.com/spreadsheets/d/1pDIzlrM23DN-J7Pi1JgtJuHe9tjDixKpUqYbd1fx0Ac/edit?gid=1893135105#gid=1893135105