What is it?

Human-Assisted Translation for Tibetan (Dharma texts).

Named Concepts

Source Language is the language in which the material to be translated is in (i.e. Tibetan). Source Language will be used interchangeably with Tibetan.

Target Language is the language to which the source material is to be translated into (e.g. English).

Source Text is the body of text to be translated, in its Source Language form.

Target Text is the body of the text to be translated, in its Target Language form.

Source Segment is the segment of the Source Text, in its Source Language form.

Target Segment is the segment of the Target Text, in its Target Language form.

Source Phrase is a phrase within the Source Segment, in its Source Language form.

Target Phrase is a phrase within the Target Segment, in its Target Language form.

Source Word is a word within the Source Segment, in its Source Language form.

Target Word is a word within the Source Segment, in its Source Language form.

Translation Memory is a custom dictionary where each entry consists of a key (a phrase in Tibetan) and a value (translation of the phrase in a given Target Language).

Term Base is a custom dictionary where each entry consists of a key (a word in Tibetan) and a value (translation of the word in a given Target Language).

Project is the entity that constitutes the translation of a single Source Text.

Project Comment is a comment specific to a given project. Project Comment will be used interchangeably with Comment, except if Comment is the first word of the sentence.

How would it work?

The high-level workings of the system can be described in a sequence of events:

1) Start a new project or continue on already started one 2) System will break Source Text into Source Segments 3) System will use Source Segments to create Target Segments 4) System will attempt to replace Source Phrases with Target Phrases in Target Segments 5) System will attempt to replace Source Word with Target Words in Target Words

In the case of starting a new project, at this stage, the user is left with a graphical user interface where the screen is equally split into three columns; Source column, Target column, and Comments column. The functionality is described below in terms of the View, and each column individually.

In the case of continuing an already started project, the user is directly led to the current state of the project.

Views

There is only one view, the Main View described below.

Main View

In addition to the three columns - Source, Target, and Meta - the main view will consist of several toggles. A very rough sketch of the interface is provided below.

Screenshot 2022-01-09 at 17 10 30

Each of the three columns can be minimized, down to just one column is visible. Equally, columns can be resized in terms of their width. In both cases, the other columns will adjust in size automatically so that the whole screen is always filled horizontally.

Toggles

Toggle that alternates between segment and reader modes
Toggle that alternates between plain and styled
Toggle that alternates between three font sizes small, medium, and large
Toggle that alternates between three line spacing modes tight, normal, and loose
Toggle that alternates between three color schemes light, retro, and dark
Toggle that alternates between showing dictionary results on hover on and off
Toggle that alternates between edit and write modes

Source Column

Source Column has a selection for switching between different versions of the Source Text when available.

Area

The area is defined for the purpose of being able to select one or more consecutive Source Segments in the Source Column. When more than one Source Segment is selected, additional options will be available in the Context menu. These are always visible in the Context menu but grayed out unless more than one Source Segments are selected. These are highlighted with * in the Context section below.

Content

The content of the Source Column consists of the Source Text broken down into Source Segments.

Context

The context menu can be activated once any part (or the whole) of the text of the Source Segment is selected.

definition
dictionary
add_to_dictionary
start_discussion
report_typo
merge_segments*

Hover

Show dictionary when toggle between showing dictionary results on hover is on.

Styling

Upon hovering the mouse on the leftmost part of the section, a circle icon will appear, clicking this will show the available styling options.

H1 for titles
H2 for subtitles
H3 for section headings
H4 for section sub-headings
Body for body copy
Footer for footer texts (e.g. "this text was discovered while meditating in so and so cave")

Target Column

Target Column has a selection for switching between different Target Languages when available.

Area

The area is defined for the purpose of being able to select one or more consecutive Target Segments in the Target Column. When more than one Target Segment is selected, additional options will be available in the Context menu. These are always visible in the Context menu but grayed out unless more than one Target Segments are selected. These are highlighted with * in the Context section below.

Content

The content of the Target Column consists of any mix of Source Text and Target Text (depending on how far it has been translated), corresponding with the Source Segment immediately to its left.

Context

The context menu can be activated once any part (or the whole) of the text of the Target Segment is selected.

start_discussion
dictionary (only works with Target Words, and not still undecided words)
dispute_definition (only works with Target Words, and not still undecided words)
report_typo
merge_segments*

Hover

Show dictionary when toggle between showing dictionary results on hover is on.

Styling

Upon hovering the mouse on the leftmost part of the section, a circle icon will appear, clicking this will show the available styling options.

H1 for titles
H2 for subtitles
H3 for section headings
H4 for section sub-headings
Body for body copy
Footer for footer texts (e.g. "this text was discovered while meditating in so and so cave")

Meta Column

Target Column has a selection for switching between Comments and Approvals

Comments

The default is Comments, where all Project Comments connected with a given segment are shown. This will work similarly to comments work in Google Docs with the possibility to resolve where the comment closes.

Approvals

Approvals are for reviewing and managing any system-wide changes (e.g. Target Word for a given Source Word changes in the underlying Term Base). This will work similarly to comments work in Google Docs with the possibility to resolve where the comment closes.

Version Control?

Projects are version-controlled based on time-interval or user prompt.

Version control can be handled via git. As long as the format is something that is a line-by-line text as it is in the interface, then the full power of git and github.com is immediately useful here. That way the version control part can be handled entirely on github.com, things like conflict resolution, comparing versions, etc., etc., etc. Padma-Translate simply needs to have user-interface functionality which corresponds with:

git add --all
git commit (here a commit message will be given or not)
git push
git pull
git clone

Data Reducancy?

One of the critical features of the system is to never lose data. There are three layers of procedure:

~~Upon every change, data is stored in two decoupled SQL databases~~
Upon "committing changes" on Github
~~Upon "committing changes" as well as periodically automatically on S3~~
Save on local as file

Questions and Answers

What if there is more than one Target Language word in the Term Base for a given Source Language word?

Then the word will be highlighted and the context menu will offer the options to choose from.

How Target Segment is Automatically Populated with Target Words and Target Phrases?

The end goal is for all the words in each Target Segment to be automatically completed based on Translation Memory and Term Base. That being the aspirational end-state, the way this actually works can be understood through the below outline.

What system will do first:

Copy the Source Segment into the Target Segment so both segments are in Source Language
Look for Source Phrases in Translation Memory within the Source Segment
In the case of a match, replace Source Phrase in Target Segment with the corresponding Target Phrase
Repeat steps 2 and 3 with Source Words and Term Base

This will result in a state where each Target Segment will be in one of three states:

Where all the words in the Target Segment are in the Target Language
Where all the words in the Target Segment, except particles and such, are in the Target Language
Where one or more words in the Target Segment are not in the Target Language

The first state will initially be very rare. The second will occur occasionally, and the last is the expected state.

What human/s will then have to do is up to four things, depending on the state:

Validate the proposed Target Phrases and Target Words
Work on deciding the Target Words for the Source Words still present in the Target Segment
Work on deciding on how particles and such will be handled and how they might affect the Target Words and words in Target Phrases
Re-arrange everything in the right order to complete the Target Segment

This process will be performed segment-by-segment at the beginning of each new project, and again for any Source Segment upon it being edited (due to finding a typo or other reason).

What is Exactly the Logic for Finding Matches?

There are three cases for matching:

Where there is a single word for the Target Language in the Term Base for a given Source Language word
Where there are multiple words available for the Target Language in the Term Base for a given Source Language word
Where there are no available words for the Target Language in the Term Base for a given Source Language word

In the first case, the source language word will be automatically replaced by the target language word.

In the second case, the word will be shown in Source Language but will be highlighted and in the context menu, in where the available options for the target language from the custom dictionary will be offered.

In the third case, the word can be added to the Term Base.

The same logic applies to phrases.

How About Non-Exact Matching?

These are excluded from the initial scope. The possibilities here fall roughly under three buckets:

Partial match
Rule-based (e.g. inference from particles etc.)
Fuzzy

How texts are loaded into the system?

Through a text area where text is copy-pasted. The text is then automatically segmented. When the person comes to the system, the system asks if a project should be loaded, or new should be started. If new is started, then the project will be given a name in the add text dialogue.

How is the text segmented?

The text will be segmented based on ། and such.

Where do terms come from?

The terms come from the custom dictionary that will live in Dictionary-Service (repo is not live yet).

Where do phrases come from?

Same as words. Note that phrases will be handled first, and then what remains will be handled by words.

How does this connect with other systems/repos?

For this RFW to make sense, the following RFCs must be completed:

Padma Translate will directly interact with:

Padma-Dictionary-Service (#4)

Lotus-King-Research / Requests

RFC0003: Padma-Translate | Human Assisted Translation for Tibetan #3

What is it?

Named Concepts

How would it work?

Views

Main View

Toggles

Source Column

Area

Content

Context

Hover

Styling

Target Column

Area

Content

Context

Hover

Styling

Meta Column

Comments

Approvals

Version Control?

Data Reducancy?

Questions and Answers

What if there is more than one Target Language word in the Term Base for a given Source Language word?

How Target Segment is Automatically Populated with Target Words and Target Phrases?

What is Exactly the Logic for Finding Matches?

How About Non-Exact Matching?

How texts are loaded into the system?

How is the text segmented?

Where do terms come from?

Where do phrases come from?

How does this connect with other systems/repos?

1

2

4