OSGeo / PROJ-JNI

Java Native Interface for PROJ
https://osgeo.github.io/PROJ-JNI/
MIT License
23 stars 15 forks source link

Fix NullpointerException when multiple threads are calling destroyExpired #56

Closed kad-simonm closed 2 years ago

kad-simonm commented 2 years ago

During heavy load testing (10M geometrys), we very incidentally encountered the following NullPointerException.

ava.lang.NullPointerException at org.osgeo.proj.Context.destroyExpired(Context.java:196) at org.osgeo.proj.Context.close(Context.java:178) at org.osgeo.proj.Operation.transform(Operation.java:787)

As far as I can judge, this problem only arises when multiple threads are trying to destroy the same expired PROJ context. The second thread then encounters a NullPointerException on the lastUsed long primitive.

To avoid this problem I have introduced a ReentrantLock which can guarantee that only 1 thread is cleaning expired PROJ contexts at a time without blocking threads

kad-simonm commented 2 years ago

Please take a look @desruisseaux @kbevers

desruisseaux commented 2 years ago

Thanks for reporting this bug! Actually the synchronization should be already ensured by CONTEXTS, which is an instance of ConcurrentLinkedDeque. We can either use the ReentrantLock or ConcurrentLinkedDeque, but using both of them would be redundant.

I believe that the bug is a missing null check after line 208. I will commit this fix in a few seconds. If you can confirm whether it works or not, that would be greatly appreciated.

kad-simonm commented 2 years ago

Thanks for the quick response.

I agree that it is no longer necessary to continue using a ConcurrentLinkedDeque. However, the ReentrantLock will then have to be used in several places. (acquire,close and destroyAll methods)

I'm not sure if your change @desruisseaux will be conclusive. There is already a null check on line 194. The exception occurred on line 196.

I'm going to try to test both solutions today. I will share the results with you as soon as possible.

desruisseaux commented 2 years ago

I'm not sure if your change @desruisseaux will be conclusive. There is already a null check on line 194. The exception occurred on line 196.

Yes, but this check is in a while loop. Because c is a local variable, it can not have been nullified by another thread. The null value can only occurs by assignment in this method body. The assignment at line 208 can be null and is executed just before the check in the while loop is executed again. In any cases a null check was missing there.

kad-simonm commented 2 years ago

Thanks for the explanation @desruisseaux. With a new test we no longer encountered this problem. This PR can therefore be closed. We are still running into another problem but I will create a new issue for this.