Maria-Liakata-NLP-Group / annotations-interface

New iteration of the Annotations Interface tool
MIT License
0 stars 0 forks source link

Improve annotation schema infrastructure #34

Open dsj976 opened 1 year ago

dsj976 commented 1 year ago

Summary

The original annotation schema infrastructure consisted of defining the possible annotation labels as enumerations. This was not a sustainable approach, as it is equivalent to hard-coding the annotation schema and as a result it is hard to update. Additionally, Alembic does not automatically detect changes in Enum values and cannot auto-generate migration scripts (see issue #23).

A better approach is to store the annotation schema in different tables of a relational database. The annotation schema can be specified by the user in a JSON file, which can then be parsed into the database. By specifying the annotation schema in a JSON file, the depth of the annotations (i.e. how many different levels of annotations) can be specified flexibly.

What needs to be done?

Updates

dsj976 commented 1 year ago

The annotation schema managers currently support:

The managers should also have methods for updating the annotation schemas without clearing the whole database table and recreating it from scratch. Consider developing the following methods:

dsj976 commented 1 year ago

Work to update the annotation schema for the client is underway. Instead of having a big table with many columns where to store data associated with the annotations, the original annotation table has been broken up into smaller relational tables. See commit 67e6983. This provides greater flexibility as the number of columns is constrained by the SQL table at the time of construction, but the number of rows is not constrained.