FasterXML / jackson-dataformats-binary

Uber-project for standard Jackson binary format backends: avro, cbor, ion, protobuf, smile
Apache License 2.0
310 stars 133 forks source link

Add support for CBOR stringref extension (`CBORGenerator.Feature.STRINGREF`) #347

Closed here-abarany closed 1 year ago

here-abarany commented 1 year ago

See http://cbor.schmorp.de/stringref for stringref definition. Repeated binary and text strings are assigned an index on write and a tagged integer is written when repeated. Parser respects the stringref IDs when reading data back in.

Nested tags are now supported. The previous getCurrentTag() function is still supported for backwards compatibility, and returns the first tag. Otherwise a new TagList type is used to efficiently handle multiple tags.

cowtowncoder commented 1 year ago

Ok, first of all: thank you for contributing this feature! It sounds a useful thing, CBOR implementing support something Smile has had for a while.

One thing I was wondering about: since this is a sizable thing, I was wondering if it was possible to split this in two PRs: one dealing with parsing, and second one (building on first) for generation? I realize it may be tricky wrt tests, but it would be easier to code review that way if at all possible.

Second thing: regardless of whether it's 1 or 2 PRs, one (and only) process thing we need is CLA. It can be found from here:

https://github.com/FasterXML/jackson/blob/master/contributor-agreement.pdf

and is only needed once before the first contribution is merged (one is good for all future PRs for all Jackson projects). The usual way is to print it, fill & sign, scan/photo, email to info at fasterxml dot com.

Thank you once again, looking forward to reviewing this PR and getting it merged for 2.15

here-abarany commented 1 year ago

One thing I was wondering about: since this is a sizable thing, I was wondering if it was possible to split this in two PRs: one dealing with parsing, and second one (building on first) for generation?

From what I understand this would require separate branches, which would be very unwieldy to manage and would require the first PR to be submitted before the second could actually be looked at. (since the second one would contain the code from the first) What I can do is split it into two commits, which would allow you to review them separately while being part of the same branch/PR.

Second thing: regardless of whether it's 1 or 2 PRs, one (and only) process thing we need is CLA.

Since I'm contributing this on company time I've forwarded this to the OSS department to make sure I get the OK to sign it. Hopefully it won't take too long to get approval.

cowtowncoder commented 1 year ago

One thing I was wondering about: since this is a sizable thing, I was wondering if it was possible to split this in two PRs: one dealing with parsing, and second one (building on first) for generation?

From what I understand this would require separate branches, which would be very unwieldy to manage and would require the first PR to be submitted before the second could actually be looked at. (since the second one would contain the code from the first) What I can do is split it into two commits, which would allow you to review them separately while being part of the same branch/PR.

Second thing: regardless of whether it's 1 or 2 PRs, one (and only) process thing we need is CLA.

Since I'm contributing this on company time I've forwarded this to the OSS department to make sure I get the OK to sign it. Hopefully it won't take too long to get approval.

Yes, would require 2 branches and I guess that won't play nicely with PRs. Too bad but I guess it makes sense to keep things bundled.

And yes, CLA should be fine; there's CCLA available if your company prefers that.

here-abarany commented 1 year ago

Just to give a heads up, we have a tentative agreement internally to sign the CCLA, but are trying to find the right people to sign it. I'm hoping we'll get it submitted by mid January. In the meantime I'll be out the next two weeks myself for the holidays.

here-abarany commented 1 year ago

@cowtowncoder I finally managed to get a signature and forwarded the CCLA last Friday. I just wanted to make sure that it was received and we can continue with the PR.

cowtowncoder commented 1 year ago

@here-abarany I can't seem to find CCLA. Was it sent to info at fasterxml dot com?

here-abarany commented 1 year ago

@cowtowncoder it was sent to clas rather than info, which was the address in the document. I can forward it to info as well.

cowtowncoder commented 1 year ago

@here-abarany Ok odd. I think clas should get forward to info... but yes, please try re-sending it as I don't see it.

here-abarany commented 1 year ago

@cowtowncoder I just forwarded it now.

cowtowncoder commented 1 year ago

Ok @here-abarany -- impressive changes! I had some questions etc, although nothing major.

My main concern is actually getting this to merge against 3.0 (master). I know it's not practical to split this into smaller chunks, but one thing that might help is if white-space changes (removal of white-space from end-of-lines) could be done as a separate PR first? I could merge that in, then into master; that'd probably remove noise slightly.

I suspect I will need help with master merge anyway but maybe little bit less.

here-abarany commented 1 year ago

@cowtowncoder thanks for the review, I hope to push up an update later today based on your comments. I'll have two separate pushes: one to rebase onto the latest 2.15 changes and one to push up the updates.

With regards to the whitespace changes, my editor is configured to clear trailing whitespace on save. I might be able to check out to the 2.15 head, save those files, and cherry-pick my changes on top of that to reduce some of the noise. If I'm successful I'll include that with my pushes later today.

I can try merging into the master branch locally and see what happens. Hopefully everything merges cleanly, if not I'll note what issues I encountered.

cowtowncoder commented 1 year ago

@cowtowncoder thanks for the review, I hope to push up an update later today based on your comments. I'll have two separate pushes: one to rebase onto the latest 2.15 changes and one to push up the updates.

With regards to the whitespace changes, my editor is configured to clear trailing whitespace on save. I might be able to check out to the 2.15 head, save those files, and cherry-pick my changes on top of that to reduce some of the noise. If I'm successful I'll include that with my pushes later today.

I can try merging into the master branch locally and see what happens. Hopefully everything merges cleanly, if not I'll note what issues I encountered.

It's unlikely to merge cleanly, partly due to package renaming, partly due to big API changes. But exactly how tricky it is can vary. :)

As to whitespace, I'll try to figure out how to do that on my end for 2.15; that way you can just rebase on 2.15 head. No need for you to have to do that now I think.

cowtowncoder commented 1 year ago

@here-abarany Ok I did white-space trimming, merged it to master (re-linking some sources that hopefully reduces conflicts -- changing Java package causes lots of pain). So you may want to merge/rebase from 2.15 now.

here-abarany commented 1 year ago

@cowtowncoder I have pushed up my changes, I believe I addressed all the comment threads from the review. The first push is just a rebase, so you should be able to view just my latest changes by comparing the latest push just before this comment.

cowtowncoder commented 1 year ago

@here-abarany Ok, sorry, missed couple of smaller things; now done with reviewing I think. Minor test changes requested to make it more likely things merge less uncleanly.

here-abarany commented 1 year ago

@cowtowncoder I have pushed up a new revision to fix your latest comments.

cowtowncoder commented 1 year ago

Thank you @here-abarany! I will pick this up again today or tomorrow, get it merged.

cowtowncoder commented 1 year ago

Ok, phew. Merge to master went ok, almost: there is just one new failure and that's for String-Refs; BiggerDataTest.testRoundTripStringref().

I suspect it is since CBORParser.nextFieldName() methods (renamed as just nextName() in 3.0) do not all handle String-refs. Or rather, new nextNameMatch() method doesn't -- it probably needs to, and I think jackson-databind uses that for Bean properties in 3.0.

Could you have a look if you can see how to fix this, @here-abarany ?

here-abarany commented 1 year ago

@cowtowncoder thanks for getting it merged in, I'll see if I can take a look on Monday for the master issue.

cowtowncoder commented 1 year ago

@cowtowncoder thanks for getting it merged in, I'll see if I can take a look on Monday for the master issue.

Thank you in advance! Hopefully it is relatively straight-forward.