Question: How can we efficiently retrieve existing annotation data by searching based on key and value?

annotation / stam

Stand-off Text Annotation Model (STAM) is a data model for stand-off-text annotation where any information on a text is represented as an annotation. This repository contains the model's full specification, extensions, schemas, examples and documentation.

Creative Commons Attribution Share Alike 4.0 International

17 stars 2 forks source link

# If ann data already exists, use it . Otherwise create a new one with new id prepared_ann_data = [] for k, v in ann_data.items(): try: ann_datas = list(ann_store.data(set=ann_dataset.id(), key=k, value=v)) prepared_ann_data.append(ann_datas[0]) except: # noqa prepared_ann_data.append( {"id": get_uuid(), "set": ann_dataset.id(), "key": k, "value": v} ) ann_store.annotate(target=text_selector, data=prepared_ann_data, id=get_uuid())

STAM will already do something similar internally, assigning a new random ID for the annotation data if it is new, and reusing the existing one if not, so you can just pass something like:

ann_store.annotate(target=text_selector, data=[
  {
     "set": ann_dataset.id(), "key": k, "value": v
  },
  {
     "set": ann_dataset.id(), "key": k2, "value": v2
  },
], id=get_uuid())

Note that I omitted the AnnotationData ID here, that means an ID will be assigned automatically. STAM assigns a random 21-char nanoid rather than a uuid, as that takes less space, see https://crates.io/crates/nanoid .

If you really do want to assign the annotationdata ID explicitly, then the method you used is okay, but can be improved slightly for performance inside the try block:

prepared_ann_data.append( next(ann_store.data(set=ann_dataset.id(), key=k, value=v, limit=1)) )

annotation / stam

Question: How can we efficiently retrieve existing annotation data by searching based on key and value? #32