OpenHFT / Chronicle-Map

Replicate your Key Value Store across your network, with consistency, persistance and performance.
http://chronicle.software/products/chronicle-map/
Apache License 2.0
2.79k stars 471 forks source link

InterProcessDeadLockException #149

Closed Hanalababy closed 2 years ago

Hanalababy commented 6 years ago

I am seeing the following exception from time to time. Not sure what cause this.. It seems like the .dat file is corrupted as problem could be fixed after I regenerated the file. `Caused by: net.openhft.chronicle.hash.locks.InterProcessDeadLockException: ChronicleMap{name=null, file=E:\pva_binary_data_TODAY\secIdSymbol.dat, identityHashCode=1995022532}: Contexts locked on this segment: net.openhft.chronicle.map.impl.CompiledMapIterationContext@38391dde: used, segment 27, local state: UNLOCKED, read lock count: 0, update lock count: 0, write lock count: 0 Current thread contexts: net.openhft.chronicle.map.impl.CompiledMapQueryContext@3924d577: unused net.openhft.chronicle.map.impl.CompiledMapIterationContext@38391dde: used, segment 27, local state: UNLOCKED, read lock count: 0, update lock count: 0, write lock count: 0

at net.openhft.chronicle.map.impl.CompiledMapIterationContext.debugContextsAndLocks(CompiledMapIterationContext.java:1798)
at net.openhft.chronicle.map.impl.CompiledMapIterationContext.debugContextsAndLocksGuarded(CompiledMapIterationContext.java:116)
at net.openhft.chronicle.map.impl.CompiledMapIterationContext$UpdateLock.lock(CompiledMapIterationContext.java:809)
at net.openhft.chronicle.map.impl.CompiledMapIterationContext.forEachSegmentEntryWhile(CompiledMapIterationContext.java:3942)
at net.openhft.chronicle.map.impl.CompiledMapIterationContext.forEachSegmentEntry(CompiledMapIterationContext.java:3948)
at net.openhft.chronicle.map.ChronicleMapIterator.fillEntryBuffer(ChronicleMapIterator.java:61)
at net.openhft.chronicle.map.ChronicleMapIterator.hasNext(ChronicleMapIterator.java:77)
at java.util.Iterator.forEachRemaining(Iterator.java:115)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at com.pva.common.util.UUIDUtil.reverseGenericeMap(UUIDUtil.java:92)
at com.pva.common.util.UUIDUtil.reverseMap(UUIDUtil.java:100)
at com.pva.algotrading.analysis.service.api.impl.SymbolLookupImpl.loadRevMap(SymbolLookupImpl.java:57)
... 29 more

Caused by: net.openhft.chronicle.hash.locks.InterProcessDeadLockException: Failed to acquire the lock in 60 seconds. Possible reasons:

Hanalababy commented 6 years ago

Anyone is seeing the same issue?

leventov commented 6 years ago

@Hanalababy you answered you question already:

It seems like the .dat file is corrupted as problem could be fixed after I regenerated the file.

This Chronicle Map (or Set) instance is persisted to disk, and the previous process (or one of parallel accessing processes) has crashed while holding this lock. In this case you should use ChronicleMapBuilder.recoverPersistedTo() procedure to access the Chronicle Map instance.

Those are parts of your original message.

Hanalababy commented 6 years ago

@leventov Should I always use recoverPersistedTo()?

leventov commented 6 years ago

After you application crashed

Hanalababy commented 6 years ago

Thanks. BTW, we had never seen this error before we updated to Chronicle 3. Any idea why is it?

leventov commented 6 years ago

When Chronicle Map 2 sees a lock it cannot acquire for a long time, it forcibly overrides it's value and assumes that another process which was holding this lock is dead. It was inherently unsafe.

Also Chronicle Map 2 doesn't do any integrity checks on the Map contents, making silent data corruption possible. New Chronicle Map's recoverPersistedTo() and createOrRecoverPersistedTo() methods scan the whole map contents and check for entry corruption. If corrupted data found, this event is logged and the data is purged.

teddie-lee commented 5 years ago

Hello, @leventov . You mentioned that the corrupted data would be purged after createOrRecoverPersistedTo() method is called. But what about the lock state of a segment? Would it be also cleared? Looking forward to your answer, Thanks.

leventov commented 5 years ago

Yes locks are cleared

teddie-lee commented 5 years ago

Thank you for your response, @leventov . However, I have few questions more. 1. Technically speaking, since we have createOrRecoverPersistedTo() method, so as long as we call this method at start up each time, then the ChronicleMap will perform as good as before. Is that right? 2. The only scenario that we need to acquire the inner locks( read lock, update lock, write lock ) manually is so called 'Multi key query', which means we need to calculate a value according to other keys and we need to stabilize those entries' states. Am I right? Thank you for being patient.

leventov commented 5 years ago
  1. Should be so, yes. However bear in mind that recovery is an expensive procedure. ChronicleMap could have behaved better in this area. See #79.

  2. Not only that. Manual context manipulation allow various interesting things, e. g. https://stackoverflow.com/a/41604599/648955 and https://stackoverflow.com/a/48653792/648955

ryankenney-dev commented 5 years ago

Is it safe to call createOrRecoverPersistedTo() from a second process (after the first has run createOrRecoverPersistedTo() once or is in the middle of running it)? Currently we've presuming this is not safe--requiring external locking mechanisms to ensure a single createOrRecoverPersistedTo() on startup.

leventov commented 5 years ago

No, it's not safe.

minborg commented 2 years ago

We have added file locking to protect from invoking recovery operations simultaneously. I am closing this issue now. Feel free to reopen if deemed to still be relevant.