ambuda-org / vidyut

Infrastructure for Sanskrit software. For Python bindings, see `vidyut-py`.
49 stars 21 forks source link

vidyuLipi script detection failures. #109

Closed mediabuff closed 7 months ago

mediabuff commented 7 months ago

vidyuLipi script detection failures.

The sample text from 'assert_two_way_pairwise' at https://github.com/ambuda-org/vidyut/blob/896ad354894fceb52600e1e12e5f9009db482196/vidyut-lipi/tests/basic.rs

Slp1 for BarahaSouth : nArAyaNaM namaskRutya naraM chaiva narOttamam | dEvIM sarasvatIM chaiva tatO jayamudIrayEt || 1 || Iast for Iso15919 : nārāyaṇaṁ namaskr̥tya naraṁ caiva narōttamam . dēvīṁ sarasvatīṁ caiva tatō jayamudīrayēt .. 1 .. Slp1 for Wx : nArAyaNaM namaskqwya naraM cEva narowwamam . xevIM sarasvawIM cEva wawo jayamuxIrayew .. 1 .. Devanagari for Nandinagari : 𑧁𑧑𑧈𑧑𑧇𑦼𑧞 𑧁𑧆𑧍𑧠𑦮𑧖𑦽𑧠𑧇 𑧁𑧈𑧞 𑦳𑧛𑧊 𑧁𑧈𑧜𑦽𑧠𑦽

akprasad commented 7 months ago

Thank you for filing this issue!

detect is missing support for Iso15919 and Nandinagari because I forgot to write code for them (😅), so these should be easy to add. I think we can support Wx and BarahaSouth if we can find characteristic bigrams for each and check for them.

Would you like to make a PR for this issue?

mediabuff commented 7 months ago

Thanks for your response. Hmm. I am not familiar with these schemes :-)

akprasad commented 7 months ago

It's not as bad as it looks!

The others are a little harder.

akprasad commented 7 months ago

I've added local support for Nandinagari and will push soon. The others are fair game.

akprasad commented 7 months ago

Pushed Nandinagari, and I have a local fix for ISO 15919.

mediabuff commented 7 months ago

See my comments, https://github.com/ambuda-org/vidyut/issues/108#issuecomment-1938020437

akprasad commented 7 months ago

This is fixed in the latest local build. Pushing soon.

akprasad commented 7 months ago

Pushed! Thanks for filing this issue. Please open a new issue if you find other detection issues. I'm sure there are many, but at least we can triage them based on severity.