ambuda-org / vidyut

Infrastructure for Sanskrit software. For Python bindings, see `vidyut-py`.
48 stars 21 forks source link

Add a new scheme to vidyut-lipi #107

Closed akprasad closed 4 months ago

akprasad commented 5 months ago

My soft goal is 50 supported schemes with good quality. We currently support 32, which means we have 18 more to go.

How to create a new scheme:

  1. Find a scheme in https://github.com/indic-transliteration/common_maps/ that is not supported in vidyut-lipi. You can find a list of our supported schemes in enum Scheme under scheme.rs.

  2. Generate a basic mapping for your scheme by updating the ALLOWED list in create_schemes.py then running ./scripts/create_schemes.py. If this command succeeded, you should see data for your scheme in autogen_schemes.rs.

  3. Add a new value to enum Scheme for your scheme, then run cargo check to see which parts of the code you need to update. Proceed to step (4) once cargo test passes.

  4. Add test cases for your scheme to basic.rs. At minimum, update the following:

    • sanskrit_independent_vowels
    • sanskrit_dependent_vowels
    • sanskrit_ayogavahas_etc
    • sanskrit_consonants_non_vedic
    • sanskrit_symbols
    • sanskrit_basic_sentences

    In general, assume that whatever Aksharamukha produces is correct.

    If your scheme is lossless -- that is, if we can convert Devanagari to your scheme and back and get the same information -- then update the assert_two_way_pairwise call. Otherwise, try using assert_one_way_pairwise instead.

  5. If your scheme has its own quirks or conventions, add tests for these as well. For examples, see thai_preceding_vowels, tibetan_subjoined_consonants, or khmer_sign_robat.

  6. Compare your output to an existing transliterator like Aksharamukha. You can run make debugger to build and run our debugger. To make your scheme appear in the debugger, you will need to add your scheme's name to schemes and schemeNames in vidyut-lipi-app.js.

  7. Once your scheme passes tests and looks good in the debugger, create a PR so we can merge it in.

ankit-pn commented 5 months ago

hey i would love to work on it. it would be great if you assign this to me! thanks

akprasad commented 5 months ago

@ankit-pn Glad to hear it! This is a tracking issue for all of the schemes we might add, so it's quite a big project. I suggest starting a bit smaller and picking just one scheme to start.

Likewise, I'll use this issue to track and share what I'm working on. For example, I'm going to start incorporating Bhaiksuki followed by Tai Tham.

akprasad commented 5 months ago

I've added Bhaiksuki locally and Tai Tham needs some more work. So I'll focus on Gunjala Gondi and Kaithi.

@ankit-pn do you want to try adding Manipuri? I can guide you if you get stuck.

akprasad commented 5 months ago

I've added support for Gunjala Gondi, Takri, Kaithi, Dogra, and Khudawadi. I'll next focus on Tai Tham, Cham, and Limbu.

akprasad commented 4 months ago

I have added support for Masaram Gondi, Nandinagari, Ol Chiki, and Zanabazar Square.

I am continuing to focus on Tai Tham, Cham, Limbu, and Meetei Mayek. With these completed, the number of schemes we support will reach 46, with 4 more to go afterward.

akprasad commented 4 months ago

I have added basic support for Tai Tham, Cham, and Assamese with work continuing on Limbu and Meetei Mayek.

akprasad commented 4 months ago

I have added basic support for Meetei Mayek and will continue on Limbu, which will bring the total to 48.

akprasad commented 4 months ago

We've reached our goal of 50 schemes. I'll close this issue, but more schemes are always welcome!