Closed raehik closed 1 year ago
This takeRestPrim
is not memory safe because it saves an Addr#
in a closure whose underlying data can get garbage collected subsequently. If we save a ForeignPtr
instead, that means that the builder closure saves exactly the same data that is returned by takeRest
, namely a ByteString
. I don't see why returning a closure would be preferred.
Thank you. I had a feeling I was going wrong somewhere. But as you say, the take
family don't do any copying anyway, so this is fairly unhelpful as well as unsafe. I had assumed we copy the bytes out into a new buffer. Instead we directly construct a ByteString
using the parser's ForeignPtrContents
. Does Haskell somehow know that it may garbage collect parts of the input string, but not the parts that were take
n? (Also, any chance you know of resources to learn about garbage collection and memory manipulation in GHC?)
Closing since not useful.
A ByteString
has a foreign malloc
-d array which lives while its ForeignPtrContents
lives. Only the whole array can be freed. This implies that any slice of a ByteString
keeps the whole original ByteString
alive. The touch#
and keepAlive#
primops from GHC.Prim
can be used to ensure that an object lives at a certain point in sequential State#
-passing execution.
We use touch#
in bytestring
to make sure that the backing data lives until its processing is finished. The unsafeUseAsCString
in runParser
also uses touch#
or keepAlive#
in the source.
Thanks a bunch for the explanation. Does that mean takeUnsafe#
takeUnsafe# :: Int# -> ParserT st e B.ByteString
takeUnsafe# n# = ParserT \fp eob s st ->
case n# <=# minusAddr# eob s of
1# -> OK# st (B.PS (ForeignPtr s fp) 0 (I# n#)) (plusAddr# s n#)
_ -> Fail# st
keeps the input bytestring alive while the parsed "slice" is alive? Is this a good idea? Should we also provide copying parsers? (I should note that I wrote these functions, and didn't understand the implication of using the ForeignPtrContents
in this way at the time.)
Data.ByteString.copy
already exists for this purpose.
I see. Thanks for all this. Might I document that these functions will keep the input bytestring alive? I find it unintuitive (and I worry it could lead to unexpected memory usage for some cases?)
Sure.
I have found a use case (i.e. it seems to be a fairly sensible type class instance somewhere) for parsing to a primitive bytestring builder. It would allow parse->serialize steps to be performed with zero intermediate copying. Perhaps you parse and edit some record, only editing select fields, and you figure the extra intermediate bytestrings (which would be copied from the parse input into their own bytestrings, then copied from again when serializing) are a waste of cycles.
I'm currently thinking of something like this (messy):
Might something like this be sensible to add?