Closed doublex closed 6 years ago
@tasn Tom, comments?
@adah1972, sorry, but not time to take a look at the moment, and don't think I'll have any in the immediate future. :( Ping me again in a few weeks if you haven't managed to fix it by then?
Are there any plans to upgrade this great library to unicode 11? Best wishes!
@doublex I made some quick fixes. Please test and check.
@roever Do you have time to check any updates are necessary in grapheme breaking?
@adah1972 Great library! Thanks a lot!
just had a look into the grapheme part.
I can easily update most of it. But the emoji stuff is a bit more complicated. I would need to add an additional table from here (https://www.unicode.org/Public/emoji//11.0/emoji-data.txt) the Extended_Pictographic one from the bottom to implement rule 11 properly. Without it we get some fails on emoji breaks.
Do we want that? I am not keen on doing that, but I think it would be the right thing to do anyways.
How about separating the work and doing the easy work first? We can go step by step, and every improvement is a good one. I do not feel it a problem if incorporating the emoji data table takes extra time and cannot be done right away.
I'll do it. I don't think it will take that long...
@doublex I think we have fully updated the library for Unicode 11. I have uploaded a test 4.1 release here (NEWS, configure.ac, and src/Makefile.am have uncommitted changes):
http://wyw.dcweb.cn/libunibreak-4.1.tar.gz
If you have time, please take a look. I will make a new release in about a week.
@adah1972 Best lightweight alternative to "ICU"
Regretfully the last "RC" failed many test cases (kudos to Andreas for making all grapheme breaking tests pass). I have fixed all regression issues, and also updated the line breaking code to reduce the number of skipped/failed tests significantly.
The RC download link remains the same.
I have made release 4.1, and am closing this issue. If there are other problems, please open a new issue.
This file: https://github.com/adah1972/libunibreak/blob/master/src/wordbreak.c Unicode 11: http://www.unicode.org/reports/tr29/tr29-32.html
marked as obsolete:
Added: