Findwise / Hydra

Distributed processing framework for search solutions
http://findwise.github.io/Hydra
Other
81 stars 47 forks source link

Getting a ConcurrentModificationException sometimes #299

Open ssimon opened 10 years ago

ssimon commented 10 years ago

Don't really know when this is happening, but I see this error in the logs from time to time:

2014-01-15 17:21:25,040 [I/O dispatcher 1] WARN com.findwise.hydra.SerializationUtils - A ConcurrentModificationException was caught during serialization. Trying again!

It seems to resolve it self, but warning tend to be bad anyway..

jwestberg commented 10 years ago

This is because the cache is essentially an in-memory database that contains a copy of each document. A stage may write back changes to that document, modifying it in memory, while that same document was already being serialized out to a second stage. This causes a ConcurrentModificationException to be thrown by the serialization code. Simply trying again would solve the issue in most cases.

Essentially, it provides some transactional safety to the document retrieval, guaranteeing that a stage will always get a consistent view of the document it is getting, and not some but not all changes made by a concurrent update. In a traditional pipeline where a document is only eligible to be processed by a single stage at a time, this condition can never happen.

Might make sense to lower the log level of the condition, or implement a shared read-write lock on the document being serialized or updated.