ambuda-org / vidyut

Infrastructure for Sanskrit software. For Python bindings, see `vidyut-py`.
49 stars 21 forks source link

Initial commit of vidyut-lipi transliteration updates. #39

Closed mudgebiscuit closed 9 months ago

mudgebiscuit commented 1 year ago

Feature - Vidut-lipi transliteration tool

Files/directory modified: vidyut-lipi/ .json : Encoding schemes. Have created encoding for itrans, slp1, devanagari and malayalam. src/lib.rs - transliteration code src/bin/transliterate.rs - command line tool to convert files. Testing is fairly limited at this point - converted one itrans (.itx) files to devanagari and malayalam. src/consts_to_json.py - utility to create language jsons.

akprasad commented 1 year ago

@mudgebiscuit I stumbled across a Rust library for transliterating Cyrillic scripts and thought you find find it interesting.

How is this work going? Is there anything I can do to help in terms of implementation? I think we can also consider splitting this PR into smaller pieces if you think the work is too much.

akprasad commented 1 year ago

Thanks for the changes @santoshbn! Let me know when this is ready for another look

mudgebiscuit commented 1 year ago

i have checked in a new version few days back. Feel free to review.

On Sat, Feb 18, 2023 at 9:39 AM Arun Prasad @.***> wrote:

Thanks for the changes @santoshbn https://github.com/santoshbn! Let me know when this is ready for another look

— Reply to this email directly, view it on GitHub https://github.com/ambuda-org/vidyut/pull/39#issuecomment-1435724661, or unsubscribe https://github.com/notifications/unsubscribe-auth/A44BOZGTE3BRSM7SXZ3WCWLWYECLHANCNFSM6AAAAAAT4LXD44 . You are receiving this because you were mentioned.Message ID: @.***>

akprasad commented 1 year ago

Thanks for the changes!

This PR is still creating translation maps from JSON files. As I mentioned earlier, please use native Rust structures instead.

Also, the high-level API is complex. Instead of:

pub fn transliterate_scheme(
    instr: String,
    inmap: TranslationScheme,
    outmap: TranslationScheme,
    outstr: &mut String,
)

I prefer something like:

pub fn transliterate(input: impl AsRef<str>, source: Scheme, dest: Scheme) -> String

Note that this version returns a string directly and simplifies the names involved (transliterate_scheme -> transliterate, TranslationScheme -> Scheme). impl AsRef is less important, but it allows us to pass either str or String. For examples of AsRef, you can grep impl AsRef in the vidyut dir.

Once these are resolved, I'll follow up with more detailed comments.