tenzin3 commented 7 months ago

RFC0154: STAM to MD serializer

Named Concepts

STAM: Stand-off Text Annotation Model is a data model for stand-off-text annotation where any information on a text is represented as an annotation.

MD: is a file extension built on a lightweight mark-up language, used to create plain-text documents that contain no other elements.

Summary

pecha.org take markdown text file inputs to store in their database and since we are planning to store most of our data in STAM, we need a serializers from STAM to markdown file.

In this serialization, each annotation should be different standard structure and should also be visually different.Exception cases such as empty segment string, extra lines should also be handled.

Dependencies

No dependencies.

Infrastructures

PechaData github page

Design Illustrations

In low level overview,

we read each annotations from stam object and categorize them by thier value such as title, subtitle, body, conclusion or in Sabche, yigchung. The two values in the bracket is start and end value of the span.
convert them into standard structure with define rules
sort all the annotations based on their span values and write to mark down.

Justification

MD file type is chosen for the serialization due its simplicity in nature making people who want to do correction on our data much easier.

Application developers dont need to interact with STAM database which has higher learning curve than markdown file.

Testing

Testing would be done on pecha ID O2FCA4A99(chojuk) to check if it satisfies the needs.

Implementation Steps

List all the steps involved during implementation.

[ ] OpenPecha/stam_annotator#23 Estimated time: 5 min Actual time:
[ ] OpenPecha/stam_annotator#24 Estimated time: 2 hours Actual time:
[ ] OpenPecha/stam_annotator#25 Estimated time: 1 day Actual time:
[ ] OpenPecha/stam_annotator#26 Estimated time: 4 hours Actual time:
[ ] OpenPecha/stam_annotator#27 Estimated time: 3 hours Actual time:

Reviewed By

@kaldan007

kaldan007 commented 7 months ago

@tenzin3 how are you planning to handle the segment annotation for multilingual alignment

tenzin3 commented 7 months ago

@kaldan007, the above design illustrations is for one opf pecha, so in multilingual alignment, this process would be done same on number of languages presented in the particular alignment.

kaldan007 commented 7 months ago

@tenzin3 i want to know how are u planning to annotate the segment annotations?

OpenPecha / Requests