Closed hpx7 closed 6 years ago
After serialization + deserialization, it looks like the resize factor has dropped from 8 to 1.
Not clear if HashOperations#fastHashSearchOrInsert
was really hanging forever or just taking a really long time. But this does seem problematic. Going to try and see if I can PR a fix.
Well, I found where resize factor isn't being set...
Fixing that works, and I have a hypothesis why. Not 100% sure, but it does fix that test. Will work on a PR this evening; in meetings today.
On further examination, I found a band-aid that stops the infinite loop but doesn't address the underlying issue. Maybe that's sufficient for now, but it's not a real fix.
The key issue is that we go into this else condition, despite what the comment says: https://github.com/DataSketches/sketches-core/blob/master/src/main/java/com/yahoo/sketches/theta/DirectQuickSelectSketch.java#L258 Resize factor is x1 (meaning don't resize) and the current array can't grow anyway because there's no additional capacity in the memory. It'd heapify itself, which is fine, but the problem is that it's not supposed to get there in the first place.
So the infinite loop happens because it keeps searching the hash table forever. It's a unique value that doesn't collide, but there's no open slot, either. The question is why this isn't caught elsewhere.
Interestingly, this isn't an issue with heapifyUpdateSketch() because it ignores ResizeFactor 1X's specification not to resize and doubles the allocated size anyway.
Gotta figure out what the correct thing to do here is. Not going to commit a patch until we've decided on the correct behavior.
Thanks for jumping on this so fast @jmalkin!
@AlexanderSaydakov reminded me that creating a sketch with resize factor set to not resize means we allocate the entire size up front. That logic appears valid for both on- and off-heap sketches, so the only issue seems to be that resize factor was not properly written.
Actually, there are two problems. The first is that the resize factor was not properly set.
The second is that the HashOperations find() functions (written some time ago) use a while{} loop instead of a do{}. The do{} form I developed later (see the HLL sketches) to catch this very kind of state failure. The do{} loop saves the starting index and if the loop returns to that same starting index a state exception is thrown. There can be multiple causes for this state failure to happen. The incorrect setting of RF is one, corruption is the other.
@leerho: That's a good point, although properly setting resize factor is sufficient to (validly) avoid the infinite loop in this case.
heapifyUpdateSketch() wasn't impacted in this case because it would call resize() and ignore ResizeFactor.X1, forcing a size doubling. wrapUpdateSketch() simply assumed everything was properly configured. Neither of those is ideal; both places should probably be validating that sizes are correct if we have RF.X1 and an under-full sketch. That would prevent the (re)construction of a corrupt sketch in the first place.
Anyway, I have a PR for resize factor now. I'll look at the do() loop next.
@hpx7 The PR has been merged into master. We'd ideally like a few days to improve robustness checks elsewhere in the code before pushing a new release. Are you able to build locally from master for your use case?
Before we release I strongly suggest we look at all similar find() routines and convert the ones we can to the do{} form because it is so much more robust. Jon, not sure who you were addressing with your last sentence.
Lee, that was for the original reporter of the issue.
Given your preference for changing the various flavors of find() methods, I think we should also consider adding better self-consistency checks in heapify() and wrap().
Agreed.
Thanks for the patch @jmalkin. I'll wait for the next release.
I've now changed the hash table operations to check for infinite loops (will throw an exception), and heapify/wrap should also check that sufficient memory is allocated if you're not allowing the sketch to resize itself. So we should have fixed the root cause, have better protection against the underlying error slipping through, and will throw an exception if there's a detected problem (well, if you're just searching the hash table it'll return not found). All merged into master.
Closing this, although we'll discuss any other changes we want before pushing a new version.
Thanks @jmalkin. If you had to guess, what would you say is the eta for a new release?
@leerho is working on Improving test coverage in a few places that aren't as well tested yet are pretty critical. Depending on how fast that goes, Friday (US Pacific time) would be great. Otherwise we'll hope for early next week.
I should add that I tried speed tests with the infinite loop detection code, and any performance difference was not obvious. My fastest single trial run was with the new code so it's not obviously worse, although that run seemed anomalous. So I'm just going to claim no apparent degradation.
sketches-core-0.10.2 has been released
Hmm deserialization of old ThetaSketches is now breaking for me:
Caused by: com.yahoo.sketches.SketchesArgumentException: Possible corruption: ResizeFactor X1, but provided array too small for sketch size
at com.yahoo.sketches.theta.DirectQuickSelectSketch.writableWrap(DirectQuickSelectSketch.java:164)
at com.yahoo.sketches.theta.UnionImpl.wrapInstance(UnionImpl.java:163)
at com.yahoo.sketches.theta.SetOperation.wrap(SetOperation.java:155)
at com.yahoo.sketches.theta.SetOperation.wrap(SetOperation.java:134)
at com.yahoo.sketches.theta.Sketches.wrapUnion(Sketches.java:180)
...
Can we add a codepath that will correctly deserialize old versions and set the appropriate resize factor etc?
To repro, just serialize an empty ThetaSketch from 0.10.1 :
new SetOperationBuilder()
.setNominalEntries(4096)
.buildUnion()
.toByteArray()
and then try to deserialize it:
byte[] data = ...
Sketches.wrapUnion(WritableMemory.wrap(data))
Right.. I know what the issue is. There isn't going to be a good conversion path without bumping the serialization version, which will require a bunch of other updates. So the question becomes what's the least bad trade off...
We debated options and went with the simplest approach. PR is awaiting review.
We always discouraged storing sketches in updatable form (union has an updatable sketch inside), and we don't do it in our systems. As a result this bug apparently was not noticed since version 0.8.4.
@hpx7, could you describe your use case please? Why do you think you need to store serialized unions as opposed to compact sketches?
@AlexanderSaydakov I was following the recommendation in https://github.com/DataSketches/sketches-core/issues/148
EDIT: Ah I suppose we could/should have used the Union for the computation and then extracted and stored the CompactSketch once it was complete. I was not aware that storing sketches in updatable form was discouraged.
Also, appreciate the quick turnaround with the fix! Could we get a new release cut?
sketches-core-0.10.3 has been released
Here's a minimal repro:
It seems to hang in
HashOperations#fastHashSearchOrInsert
I'm guessing there is some state that isn't getting serialized?