Support review of someone else's annotation

dosumis commented 4 months ago

Suggested fields from CAP:

Field Name	Value
time	Timestamp when the user saved/published/submitted the reviews. (We’re still figuring this out; cf Question #1)
cap_username	CAP username of the Reviewer who provided the feedback
name	Full name of the person who provided the review, [FIRSTNAME LASTNAME]
cellannotation_setname	The name of the labelset. Directly from the CAS schema. https://github.com/cellannotation/cell-annotation-schema/blob/main/docs/cap_anndata_schema.mdhttps://github.com/cellannotation/cell-annotation-schema/blob/main/build/CAP_schema.md
review	String. Either “Agree” or “Disagree”. This records whether the user agreed or disagreed.
cell_label	Directly from CAS schema.
cell_fullname	Directly from CAS schema.
cell_ontology_term	Directly from CAS schema.
cell_ontology_term_id	Directly from CAS schema.
explanation	Free-text of the message explaining the reasoning why the user disagreed. If “Agree”, then put in NA.

This will be supported by CAP - see https://capdevelopment.atlassian.net/wiki/spaces/CAP/pages/473464834/Feedback+on+Cell+Annotations (Link for CAP members only)

dosumis commented 4 months ago

CC @evanbiederstedt

Some notes:

TBD: Do we need separate review objects that can be attached to annotations (flattened as needed) or should we just add more fields to annotation objects? I favour the former as with these we can easily support comments from multiple reviewers.

@evanbiederstedt - are these fields present so that reviewers can suggest an alternative or were they just intended to record the annotation being reviewd?

Field Name	Value
cell_label	Directly from CAS schema.
cell_fullname	Directly from CAS schema.
cell_ontology_term	Directly from CAS schema.
cell_ontology_term_id	Directly from CAS schema.

evanbiederstedt commented 4 months ago

Do we need separate review objects that can be attached to annotations (flattened as needed)

That's my proposal. These could be *tsv files users could download.

It's simple + quick.

dosumis commented 4 months ago

Sketching our the 2 options more explicitly.

Option 1. review object list nested inside each annotation object:

annotation:
  cell_label:
  labelset:  
  cell_fullname: 
  cell_ontology_term: 
  cell_ontology_term_id:
  #...
   reviews: 
      type: array
      Items: $ref: review

review:
  time: 
  name: 
  review: "String. Either “Agree” or “Disagree”. This records whether the user agreed or disagreed."
  explanation: "Free-text of the message explaining the reasoning why the user disagreed. If “Agree”, then put in NA."

Option 2. Array of reviews at top level:

reviews: 
      type: array
      Items: $ref: review
review: 
  time: 
  name: 
  cell_label:
  labelset:  
  cell_fullname: 
  cell_ontology_term:
  cell_ontology_term_id: 
  review: 
  explanation:

In this second case we have a lot of duplicate info with annotation objects. Linking works on cell_label + labelset as keys.

In both cases, flattening into a table for reporting purposes is easy. I prefer 1 as it doesn't involved redundancy or linking on keys.

evanbiederstedt commented 4 months ago

Works for me

Do we need separate review objects that can be attached to annotations (flattened as needed)

In both cases, flattening into a table for reporting purposes is easy. I prefer 1 as it doesn't involved redundancy or linking on keys.

Agreed. I think we should ask users to download the *tsv files, and provide tooling to map into the AnnData files.

You simply need to map the cell label names AFAIK.

dosumis commented 4 months ago

From my experience with Brain Initiative - separate TSV files pretty much always get split up & out of sync. It may be possible to get around this with a CAP API that bundles them, but outside of CAP, my preference is that they stay together in one file - hence CAS JSON - best of all CAS JSON embedded in header (uns) (so we don't have the issue of keeping annotation and matrix files in sync).

I also agree that bioinformaticians hate JSON - which is why we have CAS-tools which can flatten or generate reports as dataframes. Writing CAS-JSON to the header is low overhead. Only changes to obs (annotating new cells or changing a name) are high overhead - but without flattening, we keep those to a minimum. Best of all with writing to the header - other platforms will take the files and users are unlikely to strip it out so it is persistent.

cellannotation / cell-annotation-schema

Support review of someone else's annotation #100