Using ByteString internally?

IntersectMBO / bech32

Haskell implementation of the Bech32 address format (BIP 0173).

Apache License 2.0

47 stars 13 forks source link

Using ByteString internally? #19

Open k0001 opened 4 years ago

k0001 commented 4 years ago

Have you considered using ByteString internally, rather than Text? Considering bech32 values are likely to be stored/serialized as plain ByteStrings, seeing as they don't need any special encoding other than what plain old ASCII supports, having to convert back and forth between ByteString and Text for the purposes of parsing and rendering is wasteful.

KtorZ commented 4 years ago

Considering bech32 values are likely to be stored/serialized as plain ByteStrings

I disagree here. Storing data bech32-encoded as bytestring would be silly. In the same way that storing base16-encoded data as bytestring is silly. When encoding data in a human-readable format like these, the main purpose is for displaying into user interfaces (might it be a command-line in the console, a web interface or a desktop client ...)

Hence why Text is the chosen data-type from and to which data are decoded/encoded.

rvl commented 4 years ago

Both arguments have merit. Haskell Text is internally UTF-16, so double the space usage. I have found that I needed to sprinkle Data.Text.Encoding.encodeUtf8 when using this library (but wait, it's ascii not utf-8). Other bytes-to-text encoding functions use ByteString too. On the other hand, Text clearly denotes that the bech32 value is not unreadable binary data. So on this basis I prefer Text.

k0001 commented 4 years ago

Perhaps I wasn't clear. I am not suggesting getting rid of the Text support in the API, I am suggesting using ByteString internally, and exposing and API for encoding and decoding ByteStrings directly, alongside the already existing Text one. The Text-based API would encode/decode the Text as ASCII/UTF-8 and defer its work to the ByteString-based implementation.

In my case, I'm dealing with many bech32 values which are stored as ASCII/UTF-8 bytes, as part of other data structures. Unfortunately, Text can't use these bytes as they are, so they must be converted to the Texts internal UTF-16 representation (via Data.Text.Encoding.decodeUtf8). This add significantly to the processing time, unnecessarily.

rvl commented 4 years ago

OK, I looked at how the aeson library does this, as an example.

We may wish to switch it around and have the "default" API be the Text-based wrapper.