"Takla" dataset Generation and Collection
For a collection of related resources in takla research click here
“Pure Banglish”= phonetically correct romanization
“Murad Takla” = phonetically in-correct romanization. Or “Mis-spelled Romanized Bangla”.
Detected ops for conversion (till date):
check takla_poc.ipynb for proof of concepts operations sample:
given: Sokhi bhalo kore binod beni badhiya de
Increased Operations Test
--------------------------------
generated at Number of Random Ops:1: Sokhi bhalo kore binod veni badhiya de
--------------------------------
--------------------------------
generated at Number of Random Ops:2: Sakhwa vhelau kerww bwnyd benuu budhoyoyo da
--------------------------------
Constant Single Operations Test
--------------------------------
generated: Skha bhol kora banod bini budheo de
--------------------------------
--------------------------------
generated: Sokhi bhalo kore vinod beni vadhiya de
--------------------------------
--------------------------------
generated: Soki balo kore binod beni badhiya de
--------------------------------
--------------------------------
generated: Skha bhlu kra bnd bun bodh d
--------------------------------
--------------------------------
generated: Soki balo kore binod beni badiya de
--------------------------------
cd utils
python3 -m unittest test_wordCleaner