allenai / mmda

multimodal document analysis
Apache License 2.0
158 stars 18 forks source link

New Feature: Adding Relations Support #153

Open soldni opened 1 year ago

soldni commented 1 year ago

Desired API functionality

given span group/box group sg, we define the following syntax:

T = TypeVar('T', SpanGroup, BoxGroup)

# dot notation ".?"
T -> List[T]

# dot underscore dot  "._.?"
T -> List[ RelGroup[T] ]

Data structures need

An index that maps Annotation.id to the annotation itself

For key, we are going to use <field_type, Annotation.id>; a user is no longer allowed to define id, those are internally computed

A RelationGroup class

This is a subclass of Annotation, can have metadata about the relation, links strictly two annotations together.

Indexer, but for relations

This is a mapping from <Annotation, relation_type> to a list of RelationGroup.

Maybe we want to rename the current indexer to intersection_indexer and then have a relations_indexer.

Clean up

Optional nice to have

Consider being able to overload dot notations to bet able to fast track dot-underscore-dot without accessing relations.

In order to do that, we would need to have a table that keeps track of the preferred mode of relations between fields. So for example, page.tokes might be an intersection, but citation.bib might be a relations.

soldni commented 1 year ago

ACTUALLY @josephcc hates the meh face operation ._.; so instead we are going to have three different to look for things based on type of lookup: