Is it more important than other work we could be doing: Yes
Would this work contribute to the mission of the Kumarajiva Project: Yes
Does it offer more business value than alternative solutions: Yes
Does it take less effort than alternative solutions: Yes
If you checked yes for all answers, continue to the request for job (RFJ).
Request for job
1. Summary
Align the Chinese texts with the Tibetan version of the Ratnakuta-sutra.
2. Keyword definitions
Ratnakuta-sutra: Ratnakuta-sutra (short for Mahāratnakūṭa Sūtra) is the original Sanskrit title of the scripture, but in this project, it is used also to call the corpus of its Tibetan and Chinese canonical translations, and specifically, the [Tibetan version] and Taisho 310 大寶積經
Text alignment: In this project, “text alignment” refers to matching Chinese text to corresponding Tibetan text, basically on a sentence level; but if necessary, matching of any two semantic blocks of the two languages.
Translation glossary: An index of specific terminology with approved translations in target languages that is agreed and used among translators, here who are engaging in Tibetan to Chinese Buddhist translations. Translation glossaries aid translators in ensuring each time a defined term appears in any language, it is used correctly and consistently.
Translation memory: In this project, translation memory refers to quality data of well aligned Tibetan-Chinese Buddhist text segments (on sentence, paragraph or sentence-like unit level), used to produce translation glossaries to aid new (human) Tibetan to Chinese translations.
3. Problem and context
Lack of translation memory in order to produce translation glossaries to augment new (human) Tibetan to Chinese Buddhist translation works. There are more than 400 paralleled Tibetan and Chinese canonical texts which have the potential to be aligned for this purpose, and we have selected the Ratnakuta-sutra as the first project, based on initial investigations.
This project will begin to provide quality translation memory (i.e. Tibetan-Chinese canonical text alignment data) to solve this problem.
4. Job description and scope
The scope of this project is to complete text alignment of the Tibetan and Chinese versions of the Ratnakuta-sutra, on a sentence, paragraph or sentence-like unit level.
8 budding translators from Kumarajiva, divided into groups, conduct the text alignments.
The text alignment works go through two levels of reviews on a regular basis, first by group leaders and then by the consultants.
The Project Manager coordinates the project, including training, tracking of progress and preparing necessary documents.
5. Constraints
Most team members will only have 1-2 hours per week to work on this project.
There is no budget to hire external staff.
The team members are all Kumarajiva faculty and budding translators.
6. Approach
The Tibetan-Chinese canonical text alignment is conducted using InterText, following a set of guidelines strategically designed for this text alignment task.
7. Other options
Doing it completely automated but there is not enough training data for the machine.
Train university students to conduct this work but their level of Tibetan and Buddhism will not be sufficient for this work.
Engage bilingual Tibetans to implement this work but there is not enough budget and there are not enough candidates.
8. Risks and unknowns
No matching semantic blocks in the Chinese text
Incomplete matching
Misplacement of matching
Can not achieve automated glossary extraction [Suggested solution: Test run a trunk of aligned texts on Sketch Engine, to see if it could produce meaningful result]
Budding translators getting fatigue [Suggested solution: Show the improvement of machine performance at regularly basis to keep the motivation alive.]
9. Goals and Deliverables
High quality aligned Tibetan-Chinese canonical texts in InterText format.
A detailed documentation of the first hand experience of Tibetan-Chinese canonical text alignment work.
Data produced for training an AI model to automate Tibetan-Chinese text alignment.
10. Timeline
By Nov 15 2022 the team will complete training of all the necessary tools.
By Jan 15 2023 the team will complete the alignment of the 2nd Chapter of the Ratnakuta-sutra.
RFJ title: Ratnakuta-sutra Tibetan-Chinese text alignment
RFC link: https://github.com/The-Kumarajiva-Project/Ratnakuta-chp2-01/issues/1
Client: Kumarajiva Project
Job manager: Stephanie
Preliminary questions
Check Yes or No
If you checked yes for all answers, continue to the request for job (RFJ).
Request for job
1. Summary
Align the Chinese texts with the Tibetan version of the Ratnakuta-sutra.
2. Keyword definitions
3. Problem and context
Lack of translation memory in order to produce translation glossaries to augment new (human) Tibetan to Chinese Buddhist translation works. There are more than 400 paralleled Tibetan and Chinese canonical texts which have the potential to be aligned for this purpose, and we have selected the Ratnakuta-sutra as the first project, based on initial investigations.
This project will begin to provide quality translation memory (i.e. Tibetan-Chinese canonical text alignment data) to solve this problem.
4. Job description and scope
The scope of this project is to complete text alignment of the Tibetan and Chinese versions of the Ratnakuta-sutra, on a sentence, paragraph or sentence-like unit level.
5. Constraints
6. Approach
The Tibetan-Chinese canonical text alignment is conducted using InterText, following a set of guidelines strategically designed for this text alignment task.
7. Other options
8. Risks and unknowns
9. Goals and Deliverables
10. Timeline