Open adithyaov opened 4 months ago
I am working on further improvements, but if you are in hurry you can release a minor version.
I'll make a minor release for the time being. What do you suggest we do about the incorrect version bounds on hackage for v0.4.0.1
?
Should we re-revise the version bounds or deprecate the version?
About the tests: they probably fail because you are comparing to base
which has a different Unicode version. I fixed these tests to make them pass when characters are unassigned or changed General Category. They will display a warning for such cases.
If you re-generate using ucd2haskell
and bumping Unicode to 15.1
the latest release, tests should pass with base-4.20
. So the release is not broken per se, only the test suite.
I am improving the lib before bumping to Unicode 15.1. Notably, I would like to reduce the Addr#
blobs and to check the inlining pragmas.
If you re-generate using ucd2haskell and bumping Unicode to 15.1 the latest release, tests should pass with base-4.20. So the release is not broken per se, only the test suite.
Gotcha, I'll make a minor release then.
Should I deprecate unicode-data-0.4.0.1
? The version bounds are too lax and might result in undefined behaviour if anyone uses unicode primitives from both base
and unicode-data
simultaniously.
So it makes sense to completely keep
unicode-data
in sync withbase
. We can possibly make the version bounds for thebase
dependency restrictive.
I am leaning towards this too, because this may trigger much trickier bugs in workflows. I added tracking of Unicode version in the README
, because comments in the code are not very discoverable.
The thing is, text
uses case mappings from Unicode 14.0, independently of the version of base. So there is precedent, although this is not a good situation.
Well, the solution would be for everyone to use unicode-data
, obviously 😅. Part of unicode-data
has been merged into base
(now in ghc-internal
). Now I am thinking we could move this out from ghc-internal
to create unicode-data-core
as a new boot/core GHC library. But we should make base depend on it, so that what decides the Unicode version is not directly base
anymore, but only unicode-data-core
. Thus every package using base
and unicode-data
would share the same Unicode version. If we include complex case mappings, then make text
depends on unicode-data-core
as well.
That’s a huge change though, and this will have to go through CLC. But since there are already bits of unicode-data
in ghc-internal
and that text
is desync for case mappings, I guess there will be no strong issue.
We already planned to change the versioning scheme to follow closely the one of Unicode. So I can see the following happening:
unicode-data-core-15.0.0
unicode-data-15.0
, depends on unicode-data-core >= 15.0.0 && < 15.1.0
unicode-data-names-15.0
, etc.base
, on the contrary, should have lax bounds on unicode-data-core
. I do not expect the core API to change anytime soon, so something like unicode-data-core >= 15.0.0
may be enough.
Finally, if we go that road, that means unicode-data-core
cannot depends on base
anymore.
Will probably have to open a dedicated issue for this, sorry for the wall of text 😅
Should I deprecate unicode-data-0.4.0.1? The version bounds are too lax and might result in undefined behaviour if anyone uses unicode primitives from both base and unicode-data simultaniously.
I would just fix the version bounds for base
. I am just restarting to develop this lib after a long pause, so I am not sure it is in state for a release. I mean if you must, do it, but I am not satisfied with some changes I have done a year ago.
@wismill Looks like I somehow managed to delete a comment I made.
Re-writing the essence of comment for context:
Unicode version of
base
andunicode-data
should be in sync as using bothunicode-data
andbase
at once might have unexpected behaviour. The end user does not care about the unicode version and would use primitives from bothunicode-data
andbase
.
Looks like there is already a lot of thought put into keeping packages in sync. Once we decide on how we want to do this, you can possibly offload some tasks to me.
I would just fix the version bounds for base. I am just restarting to develop this lib after a long pause, so I am not sure it is in state for a release. I mean if you must, do it, but I am not satisfied with some changes I have done a year ago.
I will fix the version bounds for base in 0.4.0.1
and make a minor release 0.4.0.2
branching off 0.4.0.1
and updating the unicode version. The minor release is
required for the time being as we need to get streamly working with ghc > 9.4.
Again, thank you for the amazing work!
updating the unicode version
@adithyaov this is a breaking change. You should bump to 0.5 then.
To unblock downstream developments I made a revision: https://hackage.haskell.org/package/unicode-data-0.4.0.1/revisions/
unicode-data-0.4.0.1
's test cases seem to break with the newer GHCs. (newer base versions) See: https://github.com/composewell/unicode-data/issues/118 I can confirm that this is the case for9.10
and9.8
. But the CIs for the latest master are passing so the problem seems to have been fixed.The hackage has version-bounds for base that are incorrect. With base-4.20 above mentioned test fails
Can release a newer version of
unicode-data
with the fix included? We can then, update the dependent packages accordingly. Should we re-revise the version bounds on hackage?