InterProcessDeadLockException

Hanalababy commented 6 years ago

I am seeing the following exception from time to time. Not sure what cause this.. It seems like the .dat file is corrupted as problem could be fixed after I regenerated the file. `Caused by: net.openhft.chronicle.hash.locks.InterProcessDeadLockException: ChronicleMap{name=null, file=E:\pva_binary_data_TODAY\secIdSymbol.dat, identityHashCode=1995022532}: Contexts locked on this segment: net.openhft.chronicle.map.impl.CompiledMapIterationContext@38391dde: used, segment 27, local state: UNLOCKED, read lock count: 0, update lock count: 0, write lock count: 0 Current thread contexts: net.openhft.chronicle.map.impl.CompiledMapQueryContext@3924d577: unused net.openhft.chronicle.map.impl.CompiledMapIterationContext@38391dde: used, segment 27, local state: UNLOCKED, read lock count: 0, update lock count: 0, write lock count: 0

at net.openhft.chronicle.map.impl.CompiledMapIterationContext.debugContextsAndLocks(CompiledMapIterationContext.java:1798)
at net.openhft.chronicle.map.impl.CompiledMapIterationContext.debugContextsAndLocksGuarded(CompiledMapIterationContext.java:116)
at net.openhft.chronicle.map.impl.CompiledMapIterationContext$UpdateLock.lock(CompiledMapIterationContext.java:809)
at net.openhft.chronicle.map.impl.CompiledMapIterationContext.forEachSegmentEntryWhile(CompiledMapIterationContext.java:3942)
at net.openhft.chronicle.map.impl.CompiledMapIterationContext.forEachSegmentEntry(CompiledMapIterationContext.java:3948)
at net.openhft.chronicle.map.ChronicleMapIterator.fillEntryBuffer(ChronicleMapIterator.java:61)
at net.openhft.chronicle.map.ChronicleMapIterator.hasNext(ChronicleMapIterator.java:77)
at java.util.Iterator.forEachRemaining(Iterator.java:115)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at com.pva.common.util.UUIDUtil.reverseGenericeMap(UUIDUtil.java:92)
at com.pva.common.util.UUIDUtil.reverseMap(UUIDUtil.java:100)
at com.pva.algotrading.analysis.service.api.impl.SymbolLookupImpl.loadRevMap(SymbolLookupImpl.java:57)
... 29 more

Caused by: net.openhft.chronicle.hash.locks.InterProcessDeadLockException: Failed to acquire the lock in 60 seconds. Possible reasons:

The lock was not released by the previous holder. If you use contexts API, for example map.queryContext(key), in a try-with-resources block.
This Chronicle Map (or Set) instance is persisted to disk, and the previous process (or one of parallel accessing processes) has crashed while holding this lock. In this case you should use ChronicleMapBuilder.recoverPersistedTo() procedure to access the Chronicle Map instance.
A concurrent thread or process, currently holding this lock, spends unexpectedly long time (more than 60 seconds) in the context (try-with-resource block) or one of overridden interceptor methods (or MapMethods, or MapEntryOperations, or MapRemoteOperations) while performing an ordinary Map operation or replication. You should either redesign your logic to spend less time in critical sections (recommended) or acquire this lock with tryLock(time, timeUnit) method call, with sufficient time specified.
Segment(s) in your Chronicle Map are very large, and iteration over them takes more than 60 seconds. In this case you should acquire this lock with tryLock(time, timeUnit) method call, with longer timeout specified.
This is a dead lock. If you perform multi-key queries, ensure you acquire segment locks in the order (ascending by segmentIndex()), you can find an example here: https://github.com/OpenHFT/Chronicle-Map#multi-key-queries

at net.openhft.chronicle.hash.impl.BigSegmentHeader.deadLock(BigSegmentHeader.java:71) at net.openhft.chronicle.hash.impl.BigSegmentHeader.updateLock(BigSegmentHeader.java:442) at net.openhft.chronicle.map.impl.CompiledMapIterationContext$UpdateLock.lock(CompiledMapIterationContext.java:807) ... 43 more`

Hanalababy commented 6 years ago

Anyone is seeing the same issue?

leventov commented 6 years ago

@Hanalababy you answered you question already:

It seems like the .dat file is corrupted as problem could be fixed after I regenerated the file.

This Chronicle Map (or Set) instance is persisted to disk, and the previous process (or one of parallel accessing processes) has crashed while holding this lock. In this case you should use ChronicleMapBuilder.recoverPersistedTo() procedure to access the Chronicle Map instance.

Those are parts of your original message.

Hanalababy commented 6 years ago

@leventov Should I always use recoverPersistedTo()?

leventov commented 6 years ago

After you application crashed

Hanalababy commented 6 years ago

Thanks. BTW, we had never seen this error before we updated to Chronicle 3. Any idea why is it?

leventov commented 6 years ago

When Chronicle Map 2 sees a lock it cannot acquire for a long time, it forcibly overrides it's value and assumes that another process which was holding this lock is dead. It was inherently unsafe.

Also Chronicle Map 2 doesn't do any integrity checks on the Map contents, making silent data corruption possible. New Chronicle Map's recoverPersistedTo() and createOrRecoverPersistedTo() methods scan the whole map contents and check for entry corruption. If corrupted data found, this event is logged and the data is purged.

teddie-lee commented 5 years ago

Hello, @leventov . You mentioned that the corrupted data would be purged after createOrRecoverPersistedTo() method is called. But what about the lock state of a segment? Would it be also cleared? Looking forward to your answer, Thanks.

leventov commented 5 years ago

Yes locks are cleared

teddie-lee commented 5 years ago

Thank you for your response, @leventov . However, I have few questions more. 1. Technically speaking, since we have createOrRecoverPersistedTo() method, so as long as we call this method at start up each time, then the ChronicleMap will perform as good as before. Is that right? 2. The only scenario that we need to acquire the inner locks( read lock, update lock, write lock ) manually is so called 'Multi key query', which means we need to calculate a value according to other keys and we need to stabilize those entries' states. Am I right? Thank you for being patient.

leventov commented 5 years ago

Should be so, yes. However bear in mind that recovery is an expensive procedure. ChronicleMap could have behaved better in this area. See #79.
Not only that. Manual context manipulation allow various interesting things, e. g. https://stackoverflow.com/a/41604599/648955 and https://stackoverflow.com/a/48653792/648955

ryankenney-dev commented 5 years ago

Is it safe to call createOrRecoverPersistedTo() from a second process (after the first has run createOrRecoverPersistedTo() once or is in the middle of running it)? Currently we've presuming this is not safe--requiring external locking mechanisms to ensure a single createOrRecoverPersistedTo() on startup.

leventov commented 5 years ago

No, it's not safe.

minborg commented 2 years ago

We have added file locking to protect from invoking recovery operations simultaneously. I am closing this issue now. Feel free to reopen if deemed to still be relevant.

OpenHFT / Chronicle-Map

InterProcessDeadLockException #149