OpenPecha / Requests

RFWs and RFCs for all OpenPecha repositories
0 stars 0 forks source link

RFW0147: BO-EN Aligner refactor #420

Open kaldan007 opened 5 months ago

kaldan007 commented 5 months ago

RFW0147: BO-EN Aligner refactor

Summary

We have a aligner pipeline which has few issues. We want those issues to be resolved by refactoring the pipeline.

Key Concepts

aligner: the aligner we are referring here is a pipeline which align Tibetan sentences with its equivalent english sentences

Context

We are getting translated articles and books from Tibetan to English and English to Tibetan. But these materials can't be use directly to train our machine translation model. In order to make them ready to train, we need to get sentence or segment pairs from those translated books or articles. Hence we have developed an aligner pipeline to get the books or article repo pairs and generate aligned pairs of segments. Publish those aligned pairs in another repo with TM as initials.

The current pipeline has following issues:

Outputs

Inputs

Timeline

Specify the expected delivery date for the project.

References

Include any relevant links or resources for additional context or information.