Open tenzin3 opened 7 months ago
@tenzin3 how are you planning to handle the segment annotation for multilingual alignment
@kaldan007, the above design illustrations is for one opf pecha, so in multilingual alignment, this process would be done same on number of languages presented in the particular alignment.
@tenzin3 i want to know how are u planning to annotate the segment annotations?
RFC0154: STAM to MD serializer
Named Concepts
STAM: Stand-off Text Annotation Model is a data model for stand-off-text annotation where any information on a text is represented as an annotation.
MD: is a file extension built on a lightweight mark-up language, used to create plain-text documents that contain no other elements.
Summary
pecha.org take markdown text file inputs to store in their database and since we are planning to store most of our data in STAM, we need a serializers from STAM to markdown file.
In this serialization, each annotation should be different standard structure and should also be visually different.Exception cases such as empty segment string, extra lines should also be handled.
Dependencies
No dependencies.
Infrastructures
PechaData github page
Design Illustrations
In low level overview,
Justification
MD file type is chosen for the serialization due its simplicity in nature making people who want to do correction on our data much easier.
Application developers dont need to interact with STAM database which has higher learning curve than markdown file.
Testing
Testing would be done on pecha ID O2FCA4A99(chojuk) to check if it satisfies the needs.
Implementation Steps
List all the steps involved during implementation.
Reviewed By
@kaldan007