ambuda-org / vidyut

Infrastructure for Sanskrit software. For Python bindings, see `vidyut-py`.
48 stars 21 forks source link

TODO: Add complete support for Grantha #90

Open akprasad opened 6 months ago

akprasad commented 6 months ago

This is a tracking issue for ongoing work related to Grantha, such as issue #85. +cc @deepestblue as a resident Grantha expert.

Documentation:

Relevant prior work:

Scope of work:

Known work remaining:

Validation:

akprasad commented 5 months ago

@deepestblue I would be grateful for your continued support here. As far as I can tell, vidyut-lipi currently passes the Grantha test cases from SaulabhyaJS. Demo: https://ambuda-org.github.io/vidyut-lipi/

akprasad commented 4 months ago

@deepestblue or @jamadagni -- would either of you be able to provide examples of the following with their Devanagari equivalents? Or, do you know where I might find them?

deepestblue commented 4 months ago
akprasad commented 4 months ago

𑍞 maps to ꣳ 𑍟 maps to ꣴ

Added local support for both with tests.

so I guess U+1133C maps to U+093C

Added local support with tests and a comment that this doesn't have documentary evidence.

𑍝 is a challenge

If this lossily maps to Devanagari 3, policies in the reverse direction might be:

U+1133B [...]

Ignored.

U+11300 is used for Prakrita languages [...] We should ignore it in vidyut unless you want to support Prakrit.

I would like to support it if possible. Does this map to U+0902 Devanagari sign anusvara?

I'll defer to Vid. Shriramana on Samaveda marks

🙏

deepestblue commented 4 months ago

If this lossily maps to Devanagari 3, policies in the reverse direction might be:

I think map back to the pluti-marker after a vowel makes the most sense.

U+11300 is used for Prakrita languages [...] We should ignore it in vidyut unless you want to support Prakrit.

I would like to support it if possible. Does this map to U+0902 Devanagari sign anusvara?

Yes, but in the Prakrita cases where it maps to U+0902, U+11302 should not map to U+0902, but instead indicates doubling (gemination) of the subsequent consonant, for which Devanagari doesn't have a symbol. So the net effect is pretty lossy. You may already know this, as the PDF I linked to has this detail.

akprasad commented 4 months ago

So the net effect is pretty lossy

Thanks, will avoid for now.

As for Samaveda accents, I have implemented the obvious mapping and will adjust it based on feedback.

So, everything else in this thread has been implemented apart from the pluta marker (need to implement) and the DOUBLE ANUSVARA (don't know how to handle).

Demo here: https://ambuda-org.github.io/vidyut-lipi/

deepestblue commented 4 months ago

the DOUBLE ANUSVARA (don't know how to handle)

Do you mean U+1135F 𑍟? If so, I mentioned above that it maps to U+A8F4 ꣴ.