Closed mediabuff closed 7 months ago
Thank you for filing this issue!
detect
is missing support for Iso15919
and Nandinagari
because I forgot to write code for them (😅), so these should be easy to add. I think we can support Wx
and BarahaSouth
if we can find characteristic bigrams for each and check for them.
Would you like to make a PR for this issue?
Thanks for your response. Hmm. I am not familiar with these schemes :-)
It's not as bad as it looks!
Nandinagari
, we can check the Unicode range. See here for how we define and use those ranges.Iso15919
, we can check for characters used in Iso15919 but not IAST -- see here. (edit: these characters include ē, ō, and a few others. For details, see autogen_schemes.rs
and compare the characters that are ISO-only with the characters that are IAST-only.)The others are a little harder.
I've added local support for Nandinagari and will push soon. The others are fair game.
Pushed Nandinagari, and I have a local fix for ISO 15919.
This is fixed in the latest local build. Pushing soon.
Pushed! Thanks for filing this issue. Please open a new issue if you find other detection issues. I'm sure there are many, but at least we can triage them based on severity.
vidyuLipi script detection failures.
The sample text from 'assert_two_way_pairwise' at https://github.com/ambuda-org/vidyut/blob/896ad354894fceb52600e1e12e5f9009db482196/vidyut-lipi/tests/basic.rs
Slp1 for BarahaSouth : nArAyaNaM namaskRutya naraM chaiva narOttamam | dEvIM sarasvatIM chaiva tatO jayamudIrayEt || 1 || Iast for Iso15919 : nārāyaṇaṁ namaskr̥tya naraṁ caiva narōttamam . dēvīṁ sarasvatīṁ caiva tatō jayamudīrayēt .. 1 .. Slp1 for Wx : nArAyaNaM namaskqwya naraM cEva narowwamam . xevIM sarasvawIM cEva wawo jayamuxIrayew .. 1 .. Devanagari for Nandinagari : 𑧁𑧑𑧈𑧑𑧇𑦼𑧞 𑧁𑧆𑧍𑧠𑦮𑧖𑦽𑧠𑧇 𑧁𑧈𑧞 𑦳𑧛𑧊 𑧁𑧈𑧜𑦽𑧠𑦽