Closed Shreeshrii closed 6 years ago
LGTM. @Shreeshrii, please update this pull request by removing the unnecessary %%%
lines (including the comment) from your patch so that it fits the upstream code uniformly.
And please update the documentation file ucharclasses.tex
also!
Hmm, looking again, I wonder whether this split is sufficient. Other non-Devanagari Indic scripts might use U+093C, nukta, U+0964, danda, and U+0965, double danda from the Devanagari block – it seems that two more classes are necessary...
@lemzwerg Please feel free to modify as needed and resubmit. Thanks!
Is there a link (or multiple links) that can be pointed to to justify this split? (e.g. does Unicode 10 have any text that explains that this split in Unicode blocks exists?)
See comments in http://www.unicode.org/charts/PDF/U0900.pdf for characters U+0951, U+0952, U+0964, and U+0965. There is no such a comment for U+093C so maybe it is a false alarm (and I have bad code in ttfautohint :-). @Shreeshrii, can you tell us whether the Devanagari nukta gets used in other Indic scripts also?
As an example, look at https://chromium.googlesource.com/chromium/deps/icu/+/5feb9ad5/source/data/translit/Deva_Beng.txt – this is Bengali text that uses a Deganavari nukta (inspite of Bengali having a separate nukta at U+09BC, so it is probably just a typo).
I've merged in the main PR for getting support in for Unicode 10, should a new version be pushed while this PR is still pending, or is it worth waiting for this PR before pushing out a new version?
Hi, I had made the split since I was looking to use a different color in pdfs for the vedic marks. The split sufficed for my use, I have a local copy of the style file I used. If you think it will be useful to other users, you can include it by making whatever modifications are necessary - I am not very conversant with PRs etc. Thanks.
Well, @Shreeshrii 's original intention of this PR (i.e., colorization of some characters) is probably no sufficient reason to add it to upstream ucharclasses – I think it is too specific to a single script and doesn't fit well into the general usage pattern. However, probably by accident, he hit an issue which needs special handling: A bunch of characters in the Devanagari block get used in other Indic scripts also and must be thus handled specially.
If I have some time I'll try to polish this PR; in particular, it's necessary to add documentation that shows how to use the feature correctly.
@Pomax: yes, I suggest that you do a new release, since the Devanagari stuff is completely separate and there is no pressure, as far as I can see.
hi all, this PR kind of fell off my radar due to lots of "real life" stuff, but if it's still worth doing this work I'll be happy to keep this PR open for rebasing and landing.
Yes, there is still something to do, I believe, so please don't close. However, I currently don't have sufficient time to work on it; maybe other people (probably with experience in Indic scripts) can chime in...
I can make the changes specific to Devanagari as a first step.
Is it OK if I close this and create a new streamlined PR including the additional classes for nukta and danda?
Sounds sensible.
Split devanagari range in three
Devanagari group includes non-marks codepoints
VedicMarks group includes DevanagariMarks, DevanagariExtended and VedicExtensions