Closed mlwilkerson closed 3 years ago
Thank you for this pull request and #39! It has already been quite some time since the problem was discovered in #6, and no solution that would preserve the current interface has been found so far.
With this alternative, I am a bit concerned that the end user would have to manage the memory on their own. It might not be a problem if the typical scenario is that one does the whole parsing in a single function without trying to pass things around.
I have tagged both with “feedback needed” for gathering some feedback from others.
I have stumbled upon this leaky alternative by @Skynoodle:
https://github.com/Skynoodle/svg/commit/e65faa79a34a4bccc886f03c236da0a469cf4925
But it seems, for complete correctness, the API has to change. I will then merge this pull request if there are no objections.
My branch was mostly just me trying to figure out the lifetimes involved and coming to the same conclusion that a correct, non-leaking fix would require either this sort of API change, or changes to various reference types returned elsewhere. Fixing it properly like this is definitely preferable, and the changes here all look good to me 👍
(To expand slightly on the alternative api changes based on my understanding: where currently &'l _
references are passed out referring to the buffer, guard objects implementing deref would likely be needed instead, and something like reference counting in the case of an owned buffer would be required)
@Skynoodle I don't follow your comment here (I don't disagree, I just don't understand yet.) It sounds like you're referring to Rust concepts that are just at the edge of my Rust knowledge, so I'm not sure where they apply here.
where currently &'l _ references are passed out referring to the buffer, guard objects implementing deref would likely be needed instead, and something like reference counting in the case of an owned buffer would be required
Could you expand a little more? I'm curious.
Could you expand a little more? I'm curious.
Sure! To be clear, that aside in my comment doesn't directly apply to this MR. I was considering an alternative set of API changes which still allowed passing in an owned buffer and getting an arbitrarily long lifetime 'l
back out. The difficulty then is that 'l
might outlive the Parser
itself, but various methods return things like slices into the buffer with lifetime 'l
. One would either need to shorten the lifetime of these returned references to at most the lifetime of &self
for those methods, or return a guard structure instead of an &'l _
. The possibly-owned buffer would then need to be kept alive at least as long as the lifetime of the Parser
and any outstanding guards handed out. These guards would themselves have lifetime at most 'l
, so in the borrowed-buffer case could be trivially sound (like the existing references), but in the owned-buffer case would need to use something like reference counting to avoid the owned buffer being destroyed early. By implementing Deref
for these guards, an API with the same spirit as the existing one would be provided (and should "just work" in most of the existing API's correct uses).
This sort of approach would probably add substantial complication to a lot of the Parser
implementation, though, so without strong motivation for owned-buffer usage the option in this MR seems better.
Thanks for the expansion.
I think I understand better, but not sure I get it all. Check my thinking?
passing in an owned buffer and getting an arbitrarily long lifetime
'l
back out
Passing in an "owned buffer", say to the Parser::new()
function, would mean: passing ownership from the client code to that function. Right?
Or does "owned buffer" refer more to returning from open()
or Parser::new()
a buffer which the calling code would subsequently own?
or return a guard structure instead of an
&'l _
I don't think I've seen "guard structure" in the Rust literature I've read. Are you thinking of something like a Trait Bound? Or structures like Rc
or Arc
? (That's what I thought you may have meant after mentioning reference-counting.) Or something else?
Passing in an "owned buffer", say to the
Parser::new()
function, would mean: passing ownership from the client code to that function. Right?Or does "owned buffer" refer more to returning from
open()
orParser::new()
a buffer which the calling code would subsequently own?
I meant "owned buffer" in the sense of the Parser
itself owning rather than borrowing the buffer - which both implies ownership of the buffer (rather than a borrow) is passed into Parser::new
and that ownership of the buffer follows the returned Parser
around afterwards (at least conceptually - the guard structures I mentioned make this a little less clear-cut).
Edit: Parser owning the buffer would itself be the result of the client code passing ownership of the buffer to the new
function
I don't think I've seen "guard structure" in the Rust literature I've read. Are you thinking of something like a Trait Bound? Or structures like
Rc
orArc
? (That's what I thought you may have meant after mentioning reference-counting.) Or something else?
It would be functionally very similar to an Rc
/Arc
here, I think (it would probably be implemented using an Rc
/Arc
for the "owned buffer" case). Rc
s could be described as a guard structure in a sense - they guard against the contained value being dropped as long as you hold an Rc
. Here, I'm using guard in the same sort of sense as MutexGuard
- ensuring that invariants are satisfied while the structure is held. For MutexGuard the invariant is that the lock is (exclusively) held. In our case, because "conceptually" the returned value isn't really meant to own the buffer (it's just a workaround to prevent the Parser from incorrectly dropping it early) I saw it a little more like a guard (upholding the invariant that the underlying buffer is still valid) than true shared ownership like is usually implied by Rc.
Edit: in the borrowed case,'l
would be a meaningful lifetime and the guard would then "just" be essentially a `&'l - so the guard would have different internal behaviour depending on whether the original
Parser` owned or borrowed its buffer_
I meant "owned buffer" in the sense of the Parser itself owning rather than borrowing the buffer
got it.
Here, I'm using guard in the same sort of sense as
MutexGuard
- ensuring that invariants are satisfied while the structure is held.
I see. I haven't used MutexGuard
before. Cool, I'll read up on that.
One part of what intrigued me about your comment, when you mentioned reference counting, is that one of my experiments had been to use Arc
. In my understanding that woulda/shoulda allowed for multiple ownership. But after chasing my tail trying to make the compiler happy about how I was trying to use that, I concluded that I must have been misunderstanding something fundamental. For you to then mention reference counting makes me wonder if perhaps some part of my intuition was right about that, and yet I just hadn't yet found the right implementation. (I'm still trying to learn how to use Arc
appropriately.)
I see. I haven't used
MutexGuard
before. Cool, I'll read up on that.
One interesting part of guard types is their existence is often almost invisible - in the case of a MutexGuard
, you obtain one by calling mutex.lock()
and, because the MutexGuard
implements Deref
and DerefMut
, proceed to treat the guard mostly as though it's a plain &mut _
, even though the MutexGuard
is needed to unlock the mutex at the right time and prevent use of the inner value outside of a current lock.
One part of what intrigued me about your comment, when you mentioned reference counting, is that one of my experiments had been to use
Arc
. In my understanding that woulda/shoulda allowed for multiple ownership. But after chasing my tail trying to make the compiler happy about how I was trying to use that, I concluded that I must have been misunderstanding something fundamental. For you to then mention reference counting makes me wonder if perhaps some part of my intuition was right about that, and yet I just hadn't yet found the right implementation. (I'm still trying to learn how to useArc
appropriately.)
It would still be rather fiddly, since holding an Arc
and a reference into that Arc
at the same time in a structure would also be self-referential so still awkward with Rust's rules. One could work around that in one of the usual ways - either by holding an Arc
and indices in the guard structure, or guaranteeing that the contents of the Arc are pinned in memory and use raw pointers with unsafe, only producing the required references in Deref
itself.
Thank you all!
closes: #6
The alternative API in this PR requires that the
content
passed into the top-levelread()
function is already a&str
. Therefore the client owns the memory allocation, and this&str
can be passed around the rest of the framework safely.It seems to have the advantage over #39 that there's no extra copy being made in memory, and that there's again only one parameter to
read()
.I prefer this one over #39.