Closed dosumis closed 1 month ago
CC @evanbiederstedt
Some notes:
TBD: Do we need separate review objects that can be attached to annotations (flattened as needed) or should we just add more fields to annotation objects? I favour the former as with these we can easily support comments from multiple reviewers.
@evanbiederstedt - are these fields present so that reviewers can suggest an alternative or were they just intended to record the annotation being reviewd?
Field Name | Value |
---|---|
cell_label | Directly from CAS schema. |
cell_fullname | Directly from CAS schema. |
cell_ontology_term | Directly from CAS schema. |
cell_ontology_term_id | Directly from CAS schema. |
Do we need separate review objects that can be attached to annotations (flattened as needed)
That's my proposal. These could be *tsv files users could download.
It's simple + quick.
Sketching our the 2 options more explicitly.
Option 1. review object list nested inside each annotation object:
annotation:
cell_label:
labelset:
cell_fullname:
cell_ontology_term:
cell_ontology_term_id:
#...
reviews:
type: array
Items: $ref: review
review:
time:
name:
review: "String. Either “Agree” or “Disagree”. This records whether the user agreed or disagreed."
explanation: "Free-text of the message explaining the reasoning why the user disagreed. If “Agree”, then put in NA."
Option 2. Array of reviews at top level:
reviews:
type: array
Items: $ref: review
review:
time:
name:
cell_label:
labelset:
cell_fullname:
cell_ontology_term:
cell_ontology_term_id:
review:
explanation:
In this second case we have a lot of duplicate info with annotation objects. Linking works on cell_label + labelset as keys.
In both cases, flattening into a table for reporting purposes is easy. I prefer 1 as it doesn't involved redundancy or linking on keys.
Works for me
Do we need separate review objects that can be attached to annotations (flattened as needed)
In both cases, flattening into a table for reporting purposes is easy. I prefer 1 as it doesn't involved redundancy or linking on keys.
Agreed. I think we should ask users to download the *tsv files, and provide tooling to map into the AnnData files.
You simply need to map the cell label names AFAIK.
From my experience with Brain Initiative - separate TSV files pretty much always get split up & out of sync. It may be possible to get around this with a CAP API that bundles them, but outside of CAP, my preference is that they stay together in one file - hence CAS JSON - best of all CAS JSON embedded in header (uns) (so we don't have the issue of keeping annotation and matrix files in sync).
I also agree that bioinformaticians hate JSON - which is why we have CAS-tools which can flatten or generate reports as dataframes. Writing CAS-JSON to the header is low overhead. Only changes to obs (annotating new cells or changing a name) are high overhead - but without flattening, we keep those to a minimum. Best of all with writing to the header - other platforms will take the files and users are unlikely to strip it out so it is persistent.
Suggested fields from CAP:
This will be supported by CAP - see https://capdevelopment.atlassian.net/wiki/spaces/CAP/pages/473464834/Feedback+on+Cell+Annotations (Link for CAP members only)