OpenPecha / Requests

RFWs and RFCs for all OpenPecha repositories
0 stars 0 forks source link

[RFC0154]: STAM to MD serializer #455

Open tenzin3 opened 7 months ago

tenzin3 commented 7 months ago

RFC0154: STAM to MD serializer

Named Concepts

STAM: Stand-off Text Annotation Model is a data model for stand-off-text annotation where any information on a text is represented as an annotation.

MD: is a file extension built on a lightweight mark-up language, used to create plain-text documents that contain no other elements.

Summary

pecha.org take markdown text file inputs to store in their database and since we are planning to store most of our data in STAM, we need a serializers from STAM to markdown file.

In this serialization, each annotation should be different standard structure and should also be visually different.Exception cases such as empty segment string, extra lines should also be handled.

Dependencies

No dependencies.

Infrastructures

PechaData github page

Design Illustrations

image

image

In low level overview,

Justification

MD file type is chosen for the serialization due its simplicity in nature making people who want to do correction on our data much easier.

Application developers dont need to interact with STAM database which has higher learning curve than markdown file.

Testing

Testing would be done on pecha ID O2FCA4A99(chojuk) to check if it satisfies the needs.

Implementation Steps

List all the steps involved during implementation.

Reviewed By

@kaldan007

kaldan007 commented 7 months ago

@tenzin3 how are you planning to handle the segment annotation for multilingual alignment

tenzin3 commented 7 months ago

@kaldan007, the above design illustrations is for one opf pecha, so in multilingual alignment, this process would be done same on number of languages presented in the particular alignment.

kaldan007 commented 7 months ago

@tenzin3 i want to know how are u planning to annotate the segment annotations?