Closed kloczek closed 7 months ago
Sorry forgot to add output.
Problems confirmed. There seem to be big changes in Unicode 14/15, causing problems in the current implementation.
The word break change is data-only, and can be fixed by simply regenerating wordbreakdata.c from the current Unicode data (done now). Other failures are non-trivial, and I need to take a look when I have more time. Sorry, no estimation.
In the meanwhile, if you can send a patch, it will be most welcome.
OK. is it only test suite issue or something more? 🤔
A potential issue (depending on your viewpoint) is that the test data are downloaded from the web, which means it can change. The code used to work with the test data, but later it stopped working with data updates.
It does not mean the code is broken, but it means the code no longer complies with the current Unicode standard.
If that is true that is really bad because to reasons:
Nevertheless I cannot find where are the lines responsible for downloading those resources. May I ask for hint where it is? 🤔
@kloczek
I have add two commits:
update-test
to retrieve the latest test data from the webHopefully it can solve your immediate problems.
Any update? 🤔
I spent some hours and fixed the grapheme test breakage. I still need to find another time slot to check the line breaking problem.
Do notice that the current change only affects Indic languages. If you are not using one of them, you will not benefit from this update; on the contrary, set_graphemebreaks
may even be slowed down a bit.
Thx fo update. If you will have anything to test please let me know 😋
I made additional updates. However, full Unicode 15.1 support for line breaking (mostly adding Brahmic tailoring) seems not achievable in the current framework, so I do not plan to implement it.
The README is updated to reflect this. If you do not find bugs in the current implementation in the next few days, I will bump the library version, make a release, and close this issue.
libunibreak 6.0 is released.
6.1 still fails
@kloczek
It looks like you are not using the LineBreakTest.txt file bundled with the release. No, the Unicode 15.1 line breaking test file does not work. That is what the special mention of 15.0 support of line breaking in README implies.
I have even tried to make update-test
not updating the line breaking test. How could you still get this error? Manually force-updating test files? (Do not do that.)
Indeed I had downloading LineBreakTest.txt from head. Thx.
Looks like with new version test suite is failing.