Open kyleclo opened 2 years ago
@soldni
The overall design seems good to me! I don't quite understand why we need AnnotationName classes though. What does the extra overhead of this class get us?
Without the class, we would need to code somewhere how IDs are constructed in the library. For now, it's field_name - integer_id
, but it's possible in the future this will need to be extended.
As well, we need some way to parse this ID for use in lookup of that specific element within a Document. I don't want field, id = obj.split('-')
everywhere throughout the code as it gets hard to maintain in case we ever change something. The class allows us to have methods .field
and .id
for use here.
@soldni
The overall design seems good to me! I don't quite understand why we need AnnotationName classes though. What does the extra overhead of this class get us?
Without the class, we would need to code somewhere how IDs are constructed in the library. For now, it's
field_name - integer_id
, but it's possible in the future this will need to be extended.As well, we need some way to parse this ID for use in lookup of that specific element within a Document. I don't want
field, id = obj.split('-')
everywhere throughout the code as it gets hard to maintain in case we ever change something. The class allows us to have methods.field
and.id
for use here.
@kyleclo Sounds good! added two small suggestions to improve it, but otherwise ok to merge!
This PR extends this library functionality substantially -- Adding a new Annotation type called Relation. A Relation is a link between 2 annotations (e.g. a Citation linked to its Bib Entry). The input Annotations are called
key
andvalue
.A few things needed to change to support Relations:
Annotation Names
Relations store references to Annotation objects. But we didn't want
Relation.to_json()
to also.to_json()
those objects. We only want to store minimal identifiers of thekey
andvalue
. Something short likebib_entry-5
orsentence-13
. We call these short stringsnames
.To do this, we added to Annotation class, an optional attribute called
field: str
which stores this name. It's automatically populated when you runDocument.annotate(new_field = list_of_annotations)
; each of those input annotations will have the new field name stored under.field
.We also added a method
name()
that returns the name of a particular Annotation object that is unique at the document-level. Names are a minimal class that basically stores.field
and.id
.In short, now after you annotate a Document with annotations, you can do stuff like:
Lookups based on names
To support reconstructing a Relation object given the names of
key
andvalue
, we need the ability to lookup those involved Annotations. We introduce a new method to enable this:to and from JSON
Finally, we need some way to serializing to JSON and reconstructing from JSON. For serialization, now that we have Names, this makes the JSON quite minimal:
Reconstructing a Relation from JSON is more tricky because it's meaningless without a Document object. The Document object must also store the specific Annotations correctly so we can correctly perform the lookup based on these Names.
The API for this is similar, but you must also pass in the Document object: