UDAAN-LEAP / leap-pe-tool

A framework for assisting human while correcting the translation/OCR errors in documents, mostly dedicated to Indian Languages.
https://udaanproject.org
BSD 3-Clause "New" or "Revised" License
33 stars 10 forks source link

ML-based: Reflect user-edits(as feedback) into the post-editing workflow #16

Open venkatapathy opened 3 years ago

venkatapathy commented 3 years ago

3 Three ways we can have a hybrid approach to reflect user-edits(as feedback) into the post-editing workflow. This can be common for ASR, Machine Translation & OCR:

Approach no 1: Naive & Quick edit-distance based C-pair population This can be done in two ways:

Pros: Easier to undo

Rest of the approaches are by default forced

Approach 2. BART

Pros: a. Probabilistic. This is good because even while forcing we can show some values or confidence

Con: a. We can't undo the suggestions b. The turn-around time to reflecting post-edits(will data augmentation help? Ref. Samrat and Shyean's work). Another way of putting it is BART works on warm-start setting c. Training time of BART itself is high walk-clock

  1. Decile SPEAR- data-programming(talk to Aayush) pros: a. Turn-around time for reflecting edits is faster by treating them as labelling functions b. Training time also can be reduced
AnjaliVijayvargiya commented 3 years ago

Note: First approach (section a) already has been implemented in version 2.5.1.