Open raehik opened 6 months ago
Something I note is that since flatparse carries around a ForeignPtrContents
in its internal state, we have to come up with one in such cases. FinalPtr :: ForeignPtrContents
exists and seems to be intended for such cases (though I've not seen or used it before).
The problem I've come against here is that Result e a
has OK a B.ByteString
. runParserPtr
can't return a ByteString
because it doesn't have "memory rights" to the bytes it's parsing. We can only return an Int
indicating the number of bytes left unparsed.
On a related note, what is the reason for ferrying a ForeignPtrContents
along in the parser type? ~Flicking through code, I don't see it used in parser combinators.~ Can we not use something like withForeignPtr
in the parser runner to keep the inner Ptr
/Addr#
in scope? Then I could imagine adapting Result
to enable returning an Int
in these such cases.
Edit: I see that the ForeignPtrContents
is used in some combinators to enable non-copying bytestring parsing e.g. byteStringOf
. So any lying to GHC (like attaching a FinalPtr
empty finalizer to an arbitrary Ptr
) would be very unsafe. I'll just copy my Ptr Word8
to a fresh ByteString
before parsing.
I might try writing a buffer parser parallel to flatparse but without carrying around the ForeignPtrContents
.
IIRC I added ForeignPtrContents
specifically to enable ByteString
creation. I found it convenient and observed that it's reliably eliminated by GHC when it's not used.
As I understand it's convenient because all the ByteString
s we generate where we use the ForeignPtrContents
are just references to parts of our input, rather than being copied. I think that's usually OK (though could be a source of memory leakage?). But in the case where we're parsing a Ptr Word8
, where all we know is 1) it's in scope and 2) it's readable, we can't do that as we don't have any control over the input's lifecycle. (I'm new to low-level Haskell and I might be wrong about this.)
It's not a complaint with the library since we advertise it as a ByteString
parser, not a buffer one. But it's an interesting little hidden detail in the library design.
How can it happen that we have a Ptr
and don't know anything about its provenance? I would say that any such situation is a bug. And if we know where the Ptr
comes from we can create a finalizer for it.
What about parsing from a buffer provided by allocaBytes :: Int -> (Ptr a -> IO b) -> IO b
? I would be happy to use FinalPtr for the finalizer. The problem is that some of our combinators use the finalizer to create "free" bytestrings (without copying). If I understand correctly, that wouldn't work in this case.
It would work with as much safety guarantee as allocaBytes
in general, where we shouldn't let the Ptr
escape, and there's no runtime or compile time check to enforce this. I can imagine situations where it's convenient to pack FinalPtr
in bytestrings and use them while we're still under the alloca
. Users who are parsing from alloca
-d Ptr
already do manual memory management, so why not let them manually scope bytestrings.
That said, if we add runParserPtr
, the docs for the bytestring operations should mention the lifetime issues and there should be also copying variants of bytestring creation for convenience.
I can imagine situations where it's convenient to pack
FinalPtr
in bytestrings and use them while we're still under thealloca
.
You're right, now I understand that I'm very happy with the internal parser design. Thanks for your patience and explanations :)
Could I start a PR adding runParserPtr
? Should it take a ForeignPtrContents
, or should we always fill in with FinalPtr
?
Btw, is there a standard function that turns Ptr
into an "unsafe" bytestring with FinalPtr
? If there's such thing I don't necessarily see need for runParserPtr
.
I can't find any Ptr a -> Int -> ByteString
on Hoogle. I realize now I could pack a fake bytestring without much work. But it feels like a convenient place to document lifetime issues for anyone trying to do similar things. And I like to avoid importing Data.ByteString.Internal
in my code where possible.
I guess we can just go with runParserPtr
then.
Brief preamble. I'm using flatparse for some binary file parsing. I'm parsing a filetable, where I know the precise length. I'm operating on
Ptr
s because I want the core to be source-agnostic (whether working on bytestrings directly, or file handles etc.). I could copy the bytes into a bytestring and parse, but I figure, why don't I parse an address directly? To my (limited) knowledge, it would seem sensible, assuming the lifetime is handled externally (e.g. usingwithForeignPtr
).Assuming good intentions (no lying about pointers or their length), would a
runParserPtr :: ParserIO e a -> Ptr Word8 -> Int -> _
be useful and safe? With lots of warnings on pitfalls of course.I can provide some example code if useful.