Pomax / ucharclasses

A XeLaTeX package that lets you insert arbitrary code between characters from different unicode blocks
15 stars 5 forks source link

Devanagari #22

Closed Shreeshrii closed 6 years ago

Shreeshrii commented 8 years ago

Split devanagari range in three

Devanagari group includes non-marks codepoints

VedicMarks group includes DevanagariMarks, DevanagariExtended and VedicExtensions

lemzwerg commented 7 years ago

LGTM. @Shreeshrii, please update this pull request by removing the unnecessary %%% lines (including the comment) from your patch so that it fits the upstream code uniformly.

And please update the documentation file ucharclasses.tex also!

lemzwerg commented 7 years ago

Hmm, looking again, I wonder whether this split is sufficient. Other non-Devanagari Indic scripts might use U+093C, nukta, U+0964, danda, and U+0965, double danda from the Devanagari block – it seems that two more classes are necessary...

Shreeshrii commented 7 years ago

@lemzwerg Please feel free to modify as needed and resubmit. Thanks!

Pomax commented 7 years ago

Is there a link (or multiple links) that can be pointed to to justify this split? (e.g. does Unicode 10 have any text that explains that this split in Unicode blocks exists?)

lemzwerg commented 7 years ago

See comments in http://www.unicode.org/charts/PDF/U0900.pdf for characters U+0951, U+0952, U+0964, and U+0965. There is no such a comment for U+093C so maybe it is a false alarm (and I have bad code in ttfautohint :-). @Shreeshrii, can you tell us whether the Devanagari nukta gets used in other Indic scripts also?

lemzwerg commented 7 years ago

As an example, look at https://chromium.googlesource.com/chromium/deps/icu/+/5feb9ad5/source/data/translit/Deva_Beng.txt – this is Bengali text that uses a Deganavari nukta (inspite of Bengali having a separate nukta at U+09BC, so it is probably just a typo).

Pomax commented 7 years ago

I've merged in the main PR for getting support in for Unicode 10, should a new version be pushed while this PR is still pending, or is it worth waiting for this PR before pushing out a new version?

Shreeshrii commented 7 years ago

Hi, I had made the split since I was looking to use a different color in pdfs for the vedic marks. The split sufficed for my use, I have a local copy of the style file I used. If you think it will be useful to other users, you can include it by making whatever modifications are necessary - I am not very conversant with PRs etc. Thanks.

lemzwerg commented 7 years ago

Well, @Shreeshrii 's original intention of this PR (i.e., colorization of some characters) is probably no sufficient reason to add it to upstream ucharclasses – I think it is too specific to a single script and doesn't fit well into the general usage pattern. However, probably by accident, he hit an issue which needs special handling: A bunch of characters in the Devanagari block get used in other Indic scripts also and must be thus handled specially.

If I have some time I'll try to polish this PR; in particular, it's necessary to add documentation that shows how to use the feature correctly.

lemzwerg commented 7 years ago

@Pomax: yes, I suggest that you do a new release, since the Devanagari stuff is completely separate and there is no pressure, as far as I can see.

Pomax commented 6 years ago

hi all, this PR kind of fell off my radar due to lots of "real life" stuff, but if it's still worth doing this work I'll be happy to keep this PR open for rebasing and landing.

lemzwerg commented 6 years ago

Yes, there is still something to do, I believe, so please don't close. However, I currently don't have sufficient time to work on it; maybe other people (probably with experience in Indic scripts) can chime in...

Shreeshrii commented 6 years ago

I can make the changes specific to Devanagari as a first step.

Shreeshrii commented 6 years ago

Is it OK if I close this and create a new streamlined PR including the additional classes for nukta and danda?

lemzwerg commented 6 years ago

Sounds sensible.