TODO: Add complete support for Grantha

akprasad commented 6 months ago

This is a tracking issue for ongoing work related to Grantha, such as issue #85. +cc @deepestblue as a resident Grantha expert.

Documentation:

Relevant prior work:

SaulabhyaJS
Perhaps Aksharamukha as well

Scope of work:

vidyut-lipi should be able to transliterate all Grantha features (as defined in U11300) to at least one other script. For Sanskrit sounds, this can be Devanagari. For Dravidian sounds, this can be Tamil, etc.
vidyut-lipi should support transliterating to and from Grantha numerals (see #85). I think this should be a transliteration option that is on by default. That is, vidyut-lipi should be updated to support transliteration options.

Known work remaining:

[x] Grantha svara markers for the Samaveda
[ ] Grantha pluta and double anusvara
[ ] Grantha sign anusvara (U+11302)

Validation:

All Grantha features must be captured in at least one automatic test. Currently, all of our transliteration tests are in basic.rs.
Sign-off by an expert in Grantha usage.

akprasad commented 5 months ago

@deepestblue I would be grateful for your continued support here. As far as I can tell, vidyut-lipi currently passes the Grantha test cases from SaulabhyaJS. Demo: https://ambuda-org.github.io/vidyut-lipi/

akprasad commented 4 months ago

@deepestblue or @jamadagni -- would either of you be able to provide examples of the following with their Devanagari equivalents? Or, do you know where I might find them?

Use of the COMBINING ANUSVARA ABOVE
Use of the COMBINING BINDU BELOW
Use of the SIGN NUKTA
Use of the DOUBLE ANUSVARA
Use of COMBINING DIGIT ZERO and the other Samaveda marks.

deepestblue commented 4 months ago

𑍞 maps to ꣳ
𑍟 maps to ꣴ
The Nuqta was introduced into Devanagari to write Perso-Arabic characters, and there's no documented usage of it in Grantha. That said, people wanted to encode it as an extension point, and so I guess U+1133C maps to U+093C
𑍝 is a challenge; it's a pluti-marker, which in Devanagari is simply indicated by the digit 3 (U+0969), and roundtripping here is going to be hard
U+1133B I believe is used for Nilagiri tribal languages and not for Sanskrit. Arguably it doesn't belong in Grantha, a point which Vid. Shriramana made unsuccessfully with the Unicode folks. I'd leave it out of vidyut.
U+11300 is used for Prakrita languages and again not in Sanskrit. We should ignore it in vidyut unless you want to support Prakrit.
I'll defer to Vid. Shriramana on Samaveda marks, on which my knowledge is very limited :-(

akprasad commented 4 months ago

𑍞 maps to ꣳ 𑍟 maps to ꣴ

Added local support for both with tests.

so I guess U+1133C maps to U+093C

Added local support with tests and a comment that this doesn't have documentary evidence.

𑍝 is a challenge

If this lossily maps to Devanagari 3, policies in the reverse direction might be:

leave unchanged
map back to pluti marker after a vowel
let the user choose which of the above they want to apply.

U+1133B [...]

Ignored.

U+11300 is used for Prakrita languages [...] We should ignore it in vidyut unless you want to support Prakrit.

I would like to support it if possible. Does this map to U+0902 Devanagari sign anusvara?

I'll defer to Vid. Shriramana on Samaveda marks

🙏

deepestblue commented 4 months ago

If this lossily maps to Devanagari 3, policies in the reverse direction might be:

I think map back to the pluti-marker after a vowel makes the most sense.

U+11300 is used for Prakrita languages [...] We should ignore it in vidyut unless you want to support Prakrit.

I would like to support it if possible. Does this map to U+0902 Devanagari sign anusvara?

Yes, but in the Prakrita cases where it maps to U+0902, U+11302 should not map to U+0902, but instead indicates doubling (gemination) of the subsequent consonant, for which Devanagari doesn't have a symbol. So the net effect is pretty lossy. You may already know this, as the PDF I linked to has this detail.

akprasad commented 4 months ago

So the net effect is pretty lossy

Thanks, will avoid for now.

As for Samaveda accents, I have implemented the obvious mapping and will adjust it based on feedback.

So, everything else in this thread has been implemented apart from the pluta marker (need to implement) and the DOUBLE ANUSVARA (don't know how to handle).

Demo here: https://ambuda-org.github.io/vidyut-lipi/

deepestblue commented 4 months ago

the DOUBLE ANUSVARA (don't know how to handle)

Do you mean U+1135F 𑍟? If so, I mentioned above that it maps to U+A8F4 ꣴ.

ambuda-org / vidyut

TODO: Add complete support for Grantha #90