HigherOrderCO / monobook

AGDA
22 stars 5 forks source link

Fix String <-> Bits conversion #34

Closed edusporto closed 6 days ago

edusporto commented 1 week ago

Solves #31.

Chars in Haskell, and therefore Agda, are Unicode codepoints and can range from 0 to U+10FFFF, so they can span a maxiumum of 21 bits. This PR "solves" the original issue by having each Char always span 21 bits, but that uses up much more memory than necessary for most string we'll actually use.

I thought of two other ways to solve this:

  1. Use Bytestrings, available in the Agda stdlib A minimal implementation can be found in the bytestring branch. The problem with that is code that compiles directly to GHC or JS won't be reduced by the type system (at least in interactive mode), so code like this won't compile:

    equals1 : to-nat (from-nat 1) == 1
    equals1 = refl
    -- to-nat (from-nat 1) != 1 of type Nat
  2. Convert Strings to bytes using the UTF-8 standard This is possible, especially considering Strings in Agda are already UTF-8 (at least in the Haskell target) since they compile to Text, but I couldn't find a straight-forward way to do it other than either doing it like option 1 or re-implementing codepoints to bytes like this Stack Overflow question explains, but that's far from straight-forward.

If we eventually we get slow-downs due to this encoding we could revisit the problem.

edusporto commented 1 week ago

The latest commits does the conversion Bits <-> String in UTF-8, but I think the performance hit on doing these transformation on Bits will probably be too much for the original use case of this feature, which I think was to use Strings as keys of maps. I think we should have both transformations available - a conversion from and to and from UTF-8 and the original simpler conversion. What do you think, @developedby? It's still a bit unclear to me the original intent of this transformation, so unless you know more about it I could also talk to Taelin.