cellannotation / cell-annotation-schema

General, open-standard schema for cell annotations
11 stars 2 forks source link

BICAN use case - track multiple annotation transfers onto a single cell set #8

Closed dosumis closed 11 months ago

dosumis commented 1 year ago

Use case: As a Brain Initiative taxonomy developer, I want to be able to keep track of annotation transfer from multiple sources onto individual nodes in my taxonomy - tracking the transferred label, the algorithm, and the reference taxonomy and dataset. I want this to be stable to changes in nomenclature in the reference taxonomy. See https://github.com/cellannotation/cell-annotation-schema/blob/main/user_stories.md#bican

Option 1:

Challenges

A not very satisfactory sketch:

 { "cell_annotation_key": "Subclass",
     "label": "D1 matrix",
     "cell_ontology_term": "matrix D1 medium spiny neuron",
     "cell_ontology_term_id": "CL:4030043",
     "Accession": "CCN20230822_204",
},
{ "cell_annotation_key":"CN20190214",  // Name of taxonomy transferred from
   "Accession": "CCN20230822_510",  // Is this confusing?
   "label": "D1m",
   "provenance": {
   "automated_annotation": {
   "algorithm_name": "scraatch_v1",  
   "transferred_to": "CCN20230822_204", // relationship to another cell set in this taxonomy
   "reference_dataset": ""
   }
}

Option 2: Schema extension

Sketch


 { "cell_annotation_key": "Subclass",
     "cell_label": "D1 matrix",
     "Annotation_transfers": [
       { "label": "D1m",
         "taxonomy": "http://bican.org/taxonomy/1.1/CN20190214",  //versioned PURL
         "transferred_from": "CN20190214_1",  // ID of cluster in 
         "algorithm_name": "scrattch_v1"
       },
       {
        "label": "fubar",
        "taxonomy": "http://bican.org/taxonomy/1.1/CN20190214",  //versioned PURL
        "transferred_from": "CN20201225_42",
        "algorithm_name": "scMap_v3"
        }
    ]
}

Conclusion - Option 2 is much more transparent and easy to implement. Suggest we add this extension. It can be a separate object under an optional key in the Annotation object.

dosumis commented 1 year ago

@UCDNJJ - comments?

dosumis commented 1 year ago

Thinking about this some more, a neater way to do this would be to modify the existing automated annotation object (in a BICAN extension) to fit the use case.

{ "definitions": {
    "automated_annotation": {
      "type": "object",
      "description": "A set of fields for recording the details of the automated annotation algorithm used.\n(Common 'automated annotation methods' would include PopV, Azimuth, CellTypist, scArches, etc.)",
      "properties": {
        "algorithm_name": {
              "type": "string",
              "description": "The name of the algorithm used. It MUST be a string of the algorithm's name."
            },
        "algorithm_version": {
          "type": "string",
          "description": "The version of the algorithm used (if applicable). It MUST be a string of the algorithm's version, which is typically in the format '[MAJOR].[MINOR]', but other versioning systems are permitted (based on the algorithm's versioning)."
            },
        "algorithm_repo_url": {
          "type": "string",
          "description" : "This field denotes the URL of the version control repository associated with the algorithm used (if applicable). It MUST be a string of a valid URL."},
        "reference_location": {
          "type": "string",
          "description": "This field denotes a valid URL of the reference dataset used to do annotation transfer (if applicable). This should be the URL of data portal location or other repository. \nThis MUST be a string of a valid URL. The concept of a 'reference' specifically refers to 'annotation transfer' algorithms, whereby a 'reference' dataset is used to transfer cell annotations to the 'query' dataset.",
          "$comment": "Maybe make optional as not clear whether this always make sense"
            }
          },
        "reference_cell_set_accession": "Accession/ID of cell set with transferred annotation in reference data (if applicable). This SHOULD be provided for annotation methods where this makes sense and a cell set accession exists.",
          "required": [
            "algorithm_name",
            "algorithm_version",
            "algorithm_repo_url"
          ]
        },
UCDNJJ commented 1 year ago

I like the solution David has here, especially recording all the reference_cell_set_accession from each individual mapping. Quick question, assuming we use the same algorithm (scANVI for example) to mapping multiple taxonomies onto our data. Will this schema support that?

We definitely need to record the taxonomy ID which should be present in a taxonomy database at the Allen Institute.

ubyndr commented 11 months ago

@dosumis to review

dosumis commented 11 months ago

Fixed - see https://github.com/cellannotation/cell-annotation-schema/blob/main/BICAN_extension.json#L9 & https://github.com/cellannotation/cell-annotation-schema/blob/main/examples/BICAN_schema_specific_examples/Silletti_annotation_transfer.json