Open theRealAph opened 3 years ago
Has anyone tried backporting this to 2.56 / Java 8 and tested?
Has anyone tried backporting this to 2.56 / Java 8 and tested?
As far as I know this patch has been ignored. I'm amazed. This is a longstanding bug, with (as it turns out) a simple and almost-obviously-correct fix.
Has anyone tried backporting this to 2.56 / Java 8 and tested?
As far as I know this patch has been ignored. I'm amazed. This is a longstanding bug, with (as it turns out) a simple and almost-obviously-correct fix.
I've cloned your repo and cherry picked your changes back to the Java-8 branch as we're still using Java 8.
To date we've had to turn off -G1GC garbage collection (which means no String deduplication 👎) and put concurrency guards around all serializing/deserializing operations to avoid any multithreaded access to the FST classes.
It's not so bad at the moment because FST is extremely fast at what it does with the size of data we're serializing but when we scale up this workaround won't cut the mustard. We'll have to remove those guards and then we'll be able to test if your changes have fixed the concurrency issue we were having.
On 6/21/21 12:53 AM, Christopher Colemani wrote:
I've cloned your repo and cherry picked your changes back to the Java-8 branch as we're still using Java 8.
To date we've had to turn off -G1GC garbage collection (which means no String deduplication 👎) and put concurrency guards around an serializing/deserializing operations to avoid any multithreaded access to the FST classes.
It's not so bad at the moment because FST is extremely fast at what it does and with the size of data we're serializing but when we scale up this workaround won't cut the mustard. We'll have to remove those guards and then we'll be able to test if your changes have fixed the concurrency issue we were having.
OK, thanks. From what I see of the comments people have more or less given up on the idea of fixing the concurrency bugs, But I don't think it's so bad. I dug into this because a customer was having crashes which looked like a JVM bug, but it turned our that FST was the problem. It'd be nice to get this fixed.
-- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://www.redhat.com https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
needs investigation, sorry but I cannot test it further as I use a brutally modded 2.57 version to support java 17 APIs. But it is not that obvious that it would work on every serialization case. This is just a note, not a proper test, maybe this patch is valuable for others.
Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException: Cannot invoke "org.nustaq.serialization.FSTObjectRegistry.clearForWrite(org.nustaq.serialization.FSTConfiguration)" because "this.objects" is null
Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException: Cannot invoke "org.nustaq.serialization.FSTClazzNameRegistry.clear()" because "this.clnames" is null
needs investigation, sorry but I cannot test it further as I use a brutally modded 2.57 version to support java 17 APIs. But it is not that obvious that it would work on every serialization case. This is just a note, not a proper test, maybe this patch is valuable for others.
Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException: Cannot invoke "org.nustaq.serialization.FSTObjectRegistry.clearForWrite(org.nustaq.serialization.FSTConfiguration)" because "this.objects" is null Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException: Cannot invoke "org.nustaq.serialization.FSTClazzNameRegistry.clear()" because "this.clnames" is null
Understood, but this patch fixes some concurrency bugs, even if it doesn't fix them all. So we might as well push it.
If anyone can find a reproducer for more concurrency bugs I can fix them.
This fixes two bugs caused by race conditions.
The first is in
FSTClazzInfo.getSer()
, which uses what seems to be a kind of "broken double-checked locking". When multiple threads are racing ingetSer()
, sometimesFSTSerializerRegistry.NULL
is returned, with chaos ensuing. An obvious fix would be to makeFSTClazzInfo.getSer()
synchronized, but simply makingser
volatile and using a local variable is enough to fix it.The second is in
FSTObjectOutput.close()
. Here, the codec is modified after it has been returned to the public configury. This kind of error is trivially detectable if you null out any references you hold to an object after returning it, so I've done that everywhere. Also, this means that there is less work for the garbage collector to do, and we all should be nice to our garbage collectors.