cellannotation / cell-annotation-schema

General, open-standard schema for cell annotations
11 stars 2 forks source link

BICAN representation of hierarchy #47

Open dosumis opened 11 months ago

dosumis commented 11 months ago

The BICAN extension allows for a simple representation of subsumption hierarchies of cell sets:

image

This assumes a system for assigning accessions to cell sets. As currently specified, the value is a list, allowing for multi-inheritance hierarchies.

The ABC atlas uses a similar schema

image

cluster_annotation_term_set_label = CAS labelset parent_term_label = accession of parent cell set (a separate field in ABC records name)

Note that the value is a single entry (not a list) - multiple parents are not supported.

A separate, semi-redundant mechanism supports multiple parents for clusters only:

image

This relies on a separate accession ID system for clusters (cluster alias) and assigns membership of clusters to encompassing cell sets. In the example shown 'supertype', 'subtype' and 'class' are labelset that make up a fixed level single inheritance subsumption hierarchy. The membership information recorded for these is duplicated using parent_term_label (as shown above). Neurotransmitter is a cross-cutting annotation - it doesn't fit the single hierarchy, and it is possible that a single cluster could have multiple parents.

Taxonomy Development Tools currently supports a single entry for parent_cell_set_accession:

image

Question: Should we stick with current CAS solution and harmonise? What challenges does this pose for tool (TDT) and product (ABC atlas) development. If there are significant challenges, what other approaches might we take?

CC @hkir-dev @lydiang

dosumis commented 11 months ago

Note - when it comes to constructing ontologies from taxonomies, it is highly preferable to keep assertions of multi-inheritance at the most granular level (as ABC atlas does) as this avoids unsafe inheritance of properties. In this case, a user might assert neurotransmitter at a higher level in the hierarchy based on some statistical test, but assuming this applies to all subsumed sets of cells would be unsafe/incorrect.

hkir-dev commented 11 months ago

TDT challenges:

dosumis commented 11 months ago

Users will want to see parent_cell_set_accessions along with their labels. It will be hard to maintain two synchronised lists.

Names will be stored associated with the accession. TDT should show this denormalisation as a view.

dosumis commented 7 months ago

Progress - #96 makes this parent cell set accession singular. Proposal: Associations with non-ranked labelset are maintained in a different table, with rows for each cluster. A member_of relation links each cluster to relevant non-ranked labelsets. This allows for MI. It assumes that non-ranked annotations will only be applied on the cluster level. (But see Siletti - where NT is applied at rank 1 in ABC representation).

Example: As CAS but without redundant links to parent classes. In this case only NT would be left, but also multiple NTs supported for single cluster.

image

There will also be a set of annotation objects for non-ranked labelsets - in this case for all NTs. However, we should forbid parent_cell_set assertions for these.

dosumis commented 7 months ago

TDT challenges:

Users will want to see parent_cell_set_accessions along with their labels. It will be hard to maintain two synchronised lists

No need to maintain synchronised lists. The master for name - ID association will be the annotation table. We need a general solution to look up labels from IDs for all content.

Nanobot UI doesn't have a string list property visualisation yet. We will need a complex UI component to maintain name & id list synchronised and support autocomplete at the same time.

Solution does not require lists.

Taxonomy tree view will be more complex. May be we should reuse OLS tree view in this case.

Taxonomy tree view will only be ranked classes. We may support other visualisations in future for non-ranked.