AndrasKovacs / flatparse

Fast parsing from bytestrings
MIT License
146 stars 12 forks source link

Add byteString #27

Closed dminuoso closed 2 years ago

dminuoso commented 2 years ago

Adds 'byteString' to parse a given ByteString sequence for when the buffer is only known dynamically, or when one wants to use a non-TH variant instead.

raehik commented 2 years ago

LGTM - thanks, this is useful. deferring merging to @AndrasKovacs

AndrasKovacs commented 2 years ago

Some comments.

The definition in this PR was pretty bad in terms of generated Core and STG. The pre-computed lengths were captured by the local helper functions as closures, so we got one closure at runtime for each helper function. Also, each indexing into the bytestring was done lazily, with one thunk allocation and one re-boxing. The lazy indexing is easily fixable with some bangs, but the closures are a bit more annoying. So instead I rewrote the definition without the pre-computed lengths in https://github.com/AndrasKovacs/flatparse/commit/d639335c5babfd285e593f7bc85e45355999d59b. I also skipped the probing for 32 and 16 bit reads; I don't think we gain much on that. However, it might be worth to consider doing a foreign call to some optimized memcmp-like function on long strings.

BTW, I'm not an expert git or github user, and I'm not sure what's the best way for me to make alterations to PR-s. I have found this: https://github.blog/2016-09-07-improving-collaboration-with-forks/, what do you think?

raehik commented 2 years ago

I can't believe I didn't know about that checkbox. That would work well. Then to edit, a maintainer must either clone the fork or add another remote to their local repo. Another solution is adding (more) people as contributors so they can make branches directly on this repository. Then you can simply git pull and git checkout origin/branch-name. (There are ways to protect the main branch in the repo settings.)