RuedigerMoeller / fast-serialization

FST: fast java serialization drop in-replacement
Apache License 2.0
1.58k stars 248 forks source link

Elusive deserialization issue caused by DefaultFSTInt2ObjectMap's clear optimization #323

Open d-w-johnson opened 2 years ago

d-w-johnson commented 2 years ago

We use FST 2.57 and we started realizing that every once in awhile after converting bytes to an object, FST would give us an object that was corrupt. Most of the object's data would look fine but upon further examination you could see that there was data of the wrong type shoved into one of the string fields. FST doesn't throw any exceptions when this happens, it just happily returns an object that is waiting to throw a runtime exception until the messed up field is accessed. Sometimes we would end up converting it to bytes again to be persisted and then it can never be deserialized again.

We ended up creating a work around that uses reflection to validate each object after being converted from bytes to ensure it isn't corrupt. When the issue is identified we just perform the operation again and it works the second time. This resolved our issues, but validating every object converted from bytes was a super heavy CPU cost.

I haven't reported anything until now because duplicating the issue is pretty difficult. However, I now have a test that deterministically reproduces the error at the same place every time and I was able to finally take some time and debug it. I found that when FSTInt2ObjectMap was turned into an interface and DefaultFSTInt2ObjectMap was created, the clear method was also optimized in order to avoid cleaning huge memory areas after writing large objects. I believe that the issue is caused by forgetting to set the next field to null in this optimized clear method. With the clear not taking full affect it leaves it open to accidentally using data from a previous object when reading in the current object.

I'll be submitting a PR containing the test that can reproduce the issue and the fix to the JDK-8 branch shortly and will link it to this issue. The fix should also be put into master.