cceh / sanskrit-web

An HTML frontend to Sanskrit dictionaries modelled in TEI.
ISC License
1 stars 2 forks source link

Diacritics in description #40

Open fxru opened 9 years ago

fxru commented 9 years ago

Diacritics such as n2 for or a1 for ā are not dealt with in pwg and monier. Those appear especially in the references, but also elsewhere. Some occurrences are dealt with in the legacy interface some are not. diacritics

gioele commented 9 years ago

This is caused by the TEI files having the wrong information about which encoding is used for those words.

Once fixed in the TEI this problem should disappear.

gioele commented 9 years ago

Strictly related to issue #20.

fxru commented 9 years ago

The problem is that those are not necessarily sanskrit words in pwg we have pra1kritisch which should be rendered prākritisch, but is a German adjectival form based on the romanisation prākrit (from प्राकृत prākṛta).

I think the remaining a1 n2 s3 ... etc. should be dealt with as atoms and not as context dependend. These words are often English or German words or abbreviation for texts as the one discussed in issue #20

In the case of English or German words (and in these abbreviations) a conversion to Unicode would be best, thus a1 to ā etc.

I wonder how many genuine undetected Sanskrit words are left in monier and pwg.