chanced / jsonptr

JSON Pointer (RFC 6901) implementation for Rust
Apache License 2.0
44 stars 4 forks source link

Slice-like Pointers #30

Closed asmello closed 2 months ago

asmello commented 4 months ago

Taking a page out of the std::path book, I figured it'd be nice to have most operations using Pointers operate on string slices instead of owned strings. This significantly helps reduce allocations/copies when manipulating this type, and allows for neat things like creating static/const Pointers that can be parsed at compilation time.

The handful of operations requiring &mut have been move to a companion buffer type called PointerBuf, named analogously to std::path::PathBuf. Also to be consistent with PathBuf, it implements Deref<Target=Pointer> so all pointer operations also work on this owned type transparently, and for the most part it can be used where the original Pointer type would.

I did make a few API breaks though. I'm open to iterating on the breaks, and maybe make some compatibility compromises, as not all changes are necessary to support this PR, but I figured this was a good opportunity to more closely align the API to std conventions and make things less surprising / more intuitive.

Since a lot of code was rewritten/simplified to take advantage of immutability and avoid re-allocations, I've also added a few quickcheck tests to generate some confidence that the new implementations work as expected. The original tests have also been preserved; they were just updated to use the new APIs.

Features

Breaks

Misc

asmello commented 2 months ago

@chanced just checking in given it's been a few months already! Happy to break this down into smaller parts if you're not comfortable merging it all together.

chanced commented 2 months ago

Hey, I'm sorry. I don't know what happened with this. I didn't get a notification that this was created and so I'm just now seeing it.

chanced commented 2 months ago

I'm going to grab some lunch right now but I'll going to look over everything when I get back.

asmello commented 2 months ago

Heh, no worries, and no hurry! It's obviously not a blocker for me or anything. I should've pinged way earlier, anyway. Really appreciate your attention now.

chanced commented 2 months ago

Actually, before I head out to grab something to eat, I've been meaning to actually reach out to you @asmello.

I'm working on a JSON Schema crate that relies on this. There are some design flaws in this crate that I want to change, like the iterator not emitting a token for the base. However, they would be breaking changes. I can work around the problems while fixing them may cause y'all churn.

There's at least one or two other things which I wanted to ask you about as well.

Haven't had a chance to look at the details of this at all yet so you may have actually already submitted some of them.

chanced commented 2 months ago

In addition to addressing the contents of this pr, I may just jot them all down as a reply here. If its easier to chat, I'm on the official rust discord with the username "chanced."

asmello commented 2 months ago

Sure, happy to talk about those planned changes over at Discord. But in general we're happy to accept breaking changes, especially if they're fixes in the original design.

BTW, not sure you're aware, but I've also contributed to json_patch with a dependency on this crate, which was released in version 2.0. It's a relatively popular crate. Making things better there was a motivating factor for this PR.

chanced commented 2 months ago

Dude. This pull request is awesome.

chanced commented 2 months ago

New slice type Pointer that enables zero-copy usage patterns.

This in particular is incredibly appreciated by me. All of it is but that is something I really, really wanted and just haven't had the time to write yet.

chanced commented 2 months ago

I'm good with all of this.

With all of this, the only two things I know that are left are:

The one question I had that may still be relevant is with regards to Tokens and how I had implemented it. There's needless allocation occurring because it stores both encoded and decoded. I'm not super happy about it but wasn't sure what the best path forward to fixing it would be. One thought I considered was a OnceCell that gets populated upon use.

chanced commented 2 months ago

If y'all are using the uri extensions, i'm okay with keeping them.

asmello commented 2 months ago

The one question I had that may still be relevant is with regards to Tokens and how I had implemented it. There's needless allocation occurring because it stores both encoded and decoded. I'm not super happy about it but wasn't sure what the best path forward to fixing it would be. One thought I considered was a OnceCell that gets populated upon use.

Hmm yeah, I think the Token implementation can use some simplifications too. I haven't looked too closely, but I think the redundant stuff is all private to the module, so we can probably change it without changing the API much. My intuition is that we can just have Token store a reference to a sub-slice of Pointer - making it a borrowed type. Given the common use case of iteration I think that works fine. If someone wants a decoded value, it can just allocate a string on-demand then, and if someone needs an owned type that outlives the original Pointer, impl ToOwned<Owned=String> might make sense (or maybe just a Token::to_string, not really sure).

EDIT: might also make sense to provide an OwnedToken type just so we can preserve the ability to decode after turning it into an owned value. Though a static jsonptr::decode(s: &str) -> Cow<'_, str> could probably serve the same purpose.

EDIT2: we could also just make Token behave like Cow and be usable both as an owned and as a borrowed type. Not sure it's worth the added complexity but it'd be kinda neat.

We're fine with removing URI support 👍 . Shouldn't affect json_patch either.

I think we could trivially fix the root token thing here, but I feel like it's disconnected enough to warrant a separate PR. I think the same applies for the other changes you've got planned. This PR is fairly substantial as is. Thankfully it should make the other changes easier to make.

asmello commented 2 months ago

Yay, thanks for merging! I think I'll play around with Token and see if I come up with a PR for it. Feel free to work on the other changes and I'd be happy to review too.

chanced commented 2 months ago

we could also just make Token behave like Cow and be usable both as an owned and as a borrowed type. Not sure it's worth the added complexity but it'd be kinda neat.

I'm using this approach heavily in my json schema crate. I really like it.

chanced commented 2 months ago

I think the Token implementation can use some simplifications too.

Oh, I have no doubt. I would embarrass myself every time I peaked the source on this crate. It was largely my first project in rust. Code quality took a back seat to appeasing borrowck or just getting it to compile at all.

chanced commented 2 months ago

I may rewrite the parser as part of the release this gets packaged as.

I'm really digging simple little enum-based finite automaton for parsers like this. (e.g.https://github.com/chanced/grill/blob/b54dd41981f06c5e0896ca07b73d9c4daefec03e/grill-core/src/big.rs#L331-L379) (although that may be a bad example as that parser will eventually need to get rewritten - but not because of the state machine).

asmello commented 2 months ago

Feel free to, it's your crate :smile:

When it comes to parsers, it's always a trade-off between code legibility and performance. The more structured your parser, the less opportunities for optimizations, but the easier it is to validate it. The JSON Pointer grammar is not particularly expensive to parse, regardless of implementation, but they're also a type that tends to get created and destroyed often, so any slowness in parsing could build up at higher scales.

I'd generally suggest starting with a simpler and more obviously correct implementation, and then move to a more specialized one as necessary, but arguably we already have a specialized one, so that guidance doesn't exactly apply. So instead what I'll say is, if you want to be really confident about your choice, add benchmarks. But honestly I don't think it matters much either way.

chanced commented 2 months ago

Feel free to, it's your crate 😄

Hah, that was mostly a note to myself but at the same time, I don't really own it per se. Releasing work under such a permissive model is akin to giving it away, of sorts. It becomes a communal project.

In fact, if you PR yourself in as an author/contributor in the Cargo.toml, I'd appreciate it. Or pass me the details and I'll add it. Same goes for you @wngr (sorry for the ping if you aren't interested).

chanced commented 2 months ago

re: rewriting the parser, you're right. Rewriting it now, without a more robust test suite and benchmarks to not only ensure it functions as desired but also is justifiable, is not a good idea.

Honestly the desire to do so was a knee jerk reaction to the angst of learning more people than I anticipated are reading my shoddy source :).

I'll rewrite it in due time, but not before it is provably better.

asmello commented 2 months ago

Releasing work under such a permissive model is akin to giving it away, of sorts. It becomes a communal project.

That's only partially true. You hold the keys to this repository and the crates.io entry, so that makes you the de facto lead of this project (not to mention you're still the main contributor/maintainer, so morally your opinion matters the most). The license means anyone can fork the code, but that'd result in a different crate.

In fact, if you PR yourself in as an author/contributor in the Cargo.toml, I'd appreciate it. Or pass me the details and I'll add it. Same goes for you @wngr (sorry for the ping if you aren't interested).

I'll submit a PR to add myself as an author then. Oliver is probably not checking emails/notifications currently, but I think we can add him too.

chanced commented 2 months ago

That's only partially true. You hold the keys to this repository and the crates.io entry, so that makes you the de facto lead of this project (not to mention you're still the main contributor/maintainer, so morally your opinion matters the most).

Understood. I didn't mean to make it seem like I was trying to skirt responsibility or anything. I just meant that I don't want to make sweeping changes which could have adverse consequences to others without at least trying to consult with those I am aware of.

There are plenty of projects where the primary takes ownership to heart. There's nothing wrong with that approach at all. I personally don't, is all.

asmello commented 2 months ago

Oh no, please don't take it that way. I didn't think you were trying to skirt responsibility or anything of the sort, just wanted to highlight how much agency you are entitled to here. You've built something awesome and although it makes sense to listen to others and accept contributions, the community will defer to you as the project's primary maintainer to make important decisions. It's not so much a matter of ownership but of respect and trust, that's what I meant.

chanced commented 2 months ago

Thanks.

chanced commented 2 months ago

@asmello is there any chance you would mind updating the CHANGELOG.md please. I'm sorry, I forgot to check for that.