RuedigerMoeller / fast-serialization

FST: fast java serialization drop in-replacement
Apache License 2.0
1.58k stars 248 forks source link

Fix for issues 274 and 235 #311

Open theRealAph opened 3 years ago

theRealAph commented 3 years ago

This fixes two bugs caused by race conditions.

The first is in FSTClazzInfo.getSer(), which uses what seems to be a kind of "broken double-checked locking". When multiple threads are racing in getSer(), sometimes FSTSerializerRegistry.NULL is returned, with chaos ensuing. An obvious fix would be to make FSTClazzInfo.getSer() synchronized, but simply making ser volatile and using a local variable is enough to fix it.

The second is in FSTObjectOutput.close(). Here, the codec is modified after it has been returned to the public configury. This kind of error is trivially detectable if you null out any references you hold to an object after returning it, so I've done that everywhere. Also, this means that there is less work for the garbage collector to do, and we all should be nice to our garbage collectors.

chrisco484 commented 3 years ago

Has anyone tried backporting this to 2.56 / Java 8 and tested?

theRealAph commented 3 years ago

Has anyone tried backporting this to 2.56 / Java 8 and tested?

As far as I know this patch has been ignored. I'm amazed. This is a longstanding bug, with (as it turns out) a simple and almost-obviously-correct fix.

chrisco484 commented 3 years ago

Has anyone tried backporting this to 2.56 / Java 8 and tested?

As far as I know this patch has been ignored. I'm amazed. This is a longstanding bug, with (as it turns out) a simple and almost-obviously-correct fix.

I've cloned your repo and cherry picked your changes back to the Java-8 branch as we're still using Java 8.

To date we've had to turn off -G1GC garbage collection (which means no String deduplication 👎) and put concurrency guards around all serializing/deserializing operations to avoid any multithreaded access to the FST classes.

It's not so bad at the moment because FST is extremely fast at what it does with the size of data we're serializing but when we scale up this workaround won't cut the mustard. We'll have to remove those guards and then we'll be able to test if your changes have fixed the concurrency issue we were having.

theRealAph commented 3 years ago

On 6/21/21 12:53 AM, Christopher Colemani wrote:

I've cloned your repo and cherry picked your changes back to the Java-8 branch as we're still using Java 8.

To date we've had to turn off -G1GC garbage collection (which means no String deduplication 👎) and put concurrency guards around an serializing/deserializing operations to avoid any multithreaded access to the FST classes.

It's not so bad at the moment because FST is extremely fast at what it does and with the size of data we're serializing but when we scale up this workaround won't cut the mustard. We'll have to remove those guards and then we'll be able to test if your changes have fixed the concurrency issue we were having.

OK, thanks. From what I see of the comments people have more or less given up on the idea of fixing the concurrency bugs, But I don't think it's so bad. I dug into this because a customer was having crashes which looked like a JVM bug, but it turned our that FST was the problem. It'd be nice to get this fixed.

-- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://www.redhat.com https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

istinnstudio commented 1 year ago

needs investigation, sorry but I cannot test it further as I use a brutally modded 2.57 version to support java 17 APIs. But it is not that obvious that it would work on every serialization case. This is just a note, not a proper test, maybe this patch is valuable for others.

Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException: Cannot invoke "org.nustaq.serialization.FSTObjectRegistry.clearForWrite(org.nustaq.serialization.FSTConfiguration)" because "this.objects" is null

Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException: Cannot invoke "org.nustaq.serialization.FSTClazzNameRegistry.clear()" because "this.clnames" is null
theRealAph commented 1 year ago

needs investigation, sorry but I cannot test it further as I use a brutally modded 2.57 version to support java 17 APIs. But it is not that obvious that it would work on every serialization case. This is just a note, not a proper test, maybe this patch is valuable for others.

Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException: Cannot invoke "org.nustaq.serialization.FSTObjectRegistry.clearForWrite(org.nustaq.serialization.FSTConfiguration)" because "this.objects" is null

Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException: Cannot invoke "org.nustaq.serialization.FSTClazzNameRegistry.clear()" because "this.clnames" is null

Understood, but this patch fixes some concurrency bugs, even if it doesn't fix them all. So we might as well push it.

If anyone can find a reproducer for more concurrency bugs I can fix them.