Closed raehik closed 1 year ago
I intend to have a look at this a bit later!
word-compat provides a compatibility layer for new GHC primitives, in terms of the old ones. I think I'd prefer a different approach, where we provide a compatibility package that gives you the new GHC primitives on the old compilers (implemented e.g. by converting all the WordX#
s to Word#
, and running the platform primitive), and export the regular ones on new compilers.
There is three options that I can see Given the example of data Word64 = W64# _
W64# :: Word64 -> Word64#
irrespective of GHC version (via view pattern or reexport)
-> Pro: We uniformly expose the sized word type
-> Pro: We can expose the future interface, rather than leaking old representation in functions like 'withAnyWord64`
-> Con: Problem: We must provide wrappers for every primop that uses this, because most primops in old GHC versions only work on Word#W64# :: Word64 -> Word#
irrespective of GHC version (via view pattern or reexport)
-> Pro: No shims needed for primops, since all ops we care about exist in Word# flavour in both old and new GHC versions.
-> Little effort.
-> Con: We force everything into a Word# which breaks 32 bit compatibilityFlatparse currently uses a slight mixture (given it has primop wrappers that either expose sized on all versions, and it has wrappers that switch type depending on version) where word-compat uses (2). Effort-wise, (2) is by far the easiest, (3) exposes these artifacts to the user, while (1) would keep them hidden inside.
Ideally I'd like (1) or (3). I used a mix of them in Flatparse in order to retain simplicity where possible (very hacky).
The problem with (1) is that I can't see how to get a Word64#
at all on GHC 9.0! GHC.Exts on base-4.15.1.0 (GHC 9.0) defines it, then never uses it.
I started on supporting GHC 9.4 at #29 that mostly goes with option (3).
First, the CPP switched type aliases like Word8'#
suffer from PVP instability, namely that updating base
can cause breakage To see why, imagine you are using anyWord16#
where Word16'# ~ Word#
, and after an GHC update from 9.0 to 9.4 it becomes Word16'# ~ Word16#
. I do not think this can be addressed, unless any{Word,Int}{8,16,32}#
are moved of public API (say an Internal or Unstable module). @AndrasKovacs How important is PVP stability to you?
Then, I've had some time to do research on GHC and talk with Andreas.
In GHC <= 9.0 the sized primtypes (Word8#, Int8#, etc...) are mostly unusable for users. There are mostly no primops available to consume them. I began noticing this while trying to implement shims for word{8,16,32,64}ToWord#
. The only way to deal with them, you have to unsafeCoerce# them to something else - in principle this is safe however, as in GHC <= 9.0 sized primtypes are implemented as native primtypes, so you can just unsafeCoerce# between then. Proper Cmm conversions exist, so it wont trip cmm lint. So this would be usable, but this would require documention that the few users who do want to use withAny{Word,Int}{8,16,32}
parsers and want to use GHC <= 9.0 must unsafeCoerce#
their values into Word# for usability.
However, on newer GHC versions I am not sure whether that would be safe. If it is unsafe, this could have the extremely undesirable effect of inducing crashes or garbage parsers after updating GHC from 9.0 to say 9.4. I am not sure however this is actually the case, this needs more research.
As I see the only place where CPP-switched primitive types appear in exports is the "machine integer continuation" functions. Could we just change the types of those to refer to ordinary WordN
and IntN
instead? I do not think that any unboxing is lost by doing that, because all CPS'd functions must be inlined anyway. If we do this, then there's no CPP-switching in exports anymore.
Well, they are not required to be inlined... one can also write withAnyWord8 fun
from a user side with {-# NOINLINE fun #-}
(or maybe fun
is just too large for GHC to inline).
But that shouldn't be a concern of ours - in all reasonable code the unboxing inside the continuation will occur automatically.
I second providing boxed types in the continuation parsers, for us GHC will get rid of the intermediate boxes, and it provides a stable interface.
The one thing we should look at then, is how to handle Word64# internally. Ideally, we should use Word64# in older versions. While not visbile on haddock, Word64# and related primops are available on 32 bit systems. With that, we could even get 32-bit support back.
OK, thanks very for discussing unboxing -- I don't understand well how GHC unboxes things and was aiming for easily-provable performance. Boxing the machine int parsers fixes most issues. In order to fully remove CPP, some other areas require boxing too:
scan64# :: Word64 -> Parser e ()
scan64# (W64# w) = Parser \fp eob s ->
case indexWord64OffAddr# s 0# of
w' -> case eqWord64# w w' of
1# -> OK# () (plusAddr# s 8#)
_ -> Fail#
eqWord64# :: Word64# -> Word64# -> Int#
doesn't exist prior to GHC 9.4 . Instead, we remain boxed:
scan64# w = Parser \fp eob s ->
if W64# (indexWord64OffAddr# s 0#) == w
then OK# () (plusAddr# s 8#)
else Fail#
Now it works on all GHCs, and I can see how it's trivial to unbox. Is this sort of change OK to make across the board? It seems we can get rid of my hacky UnboxedNumerics
module this way.
Regarding Word
size: @dminuoso do you mean write everything using 8-byte Word64
s, and simply accept that we lose performance on 32-bit? I would be very happy if it's that simple.
I'm fine with using CPP internally, I'd only like to remove them from exported types. I overall prefer CPP'd primops over class-based overloading across GHC versions. It leaves us with more control and we need CPP internally anyway.
Related Stackage issue: https://github.com/commercialhaskell/stackage/issues/6767
This should be a case of tweaking the machine integer compatibility layer at
FlatParse.Internal.UnboxedNumerics
to handle theWord#
,Word64#
change. We might also check out fumieval's word-compat package and see if can be applied here, I never gave it a proper look.