cojen / TuplDB

TuplDB is a high-performance, concurrent, transactional, scalable, low-level embedded database.
GNU Affero General Public License v3.0
110 stars 22 forks source link

ClosedByInterruptException #90

Closed MartinHaeusler closed 6 years ago

MartinHaeusler commented 6 years ago

Hi!

I recently ran into the following error with TUPL 1.3.11, which occurred on an instance which was running non-stop for an extended period of time.

I don't really have any explanation for it, apparently it's connected to Thread Interrupts in Java (which I never use in the application). Did you encounter this particular error before?

Caused by: java.nio.channels.ClosedChannelException: null
    at sun.nio.ch.FileChannelImpl.ensureOpen(Unknown Source)
    at sun.nio.ch.FileChannelImpl.read(Unknown Source)
    at org.cojen.tupl.io.JavaFileIO.doRead(JavaFileIO.java:145)
    at org.cojen.tupl.io.AbstractFileIO.access(AbstractFileIO.java:311)
    at org.cojen.tupl.io.AbstractFileIO.access(AbstractFileIO.java:327)
    at org.cojen.tupl.io.AbstractFileIO.read(AbstractFileIO.java:164)
    at org.cojen.tupl.io.FilePageArray.readPage(FilePageArray.java:93)
    at org.cojen.tupl._SnapshotPageArray.readPage(_SnapshotPageArray.java:100)
    at org.cojen.tupl._DurablePageDb.readPage(_DurablePageDb.java:374)
    at org.cojen.tupl._LocalDatabase.readNode(_LocalDatabase.java:4166)
    at org.cojen.tupl._LocalDatabase.nodeMapLoadFragment(_LocalDatabase.java:2961)
    at org.cojen.tupl._LocalDatabase.reconstruct(_LocalDatabase.java:3877)
    at org.cojen.tupl._LocalDatabase.reconstruct(_LocalDatabase.java:3785)
    at org.cojen.tupl._Node.retrieveLeafValueAtLoc(_Node.java:1800)
    at org.cojen.tupl._Node.retrieveLeafValue(_Node.java:1775)
    at org.cojen.tupl._TreeCursor.tryCopyCurrent(_TreeCursor.java:1556)
    at org.cojen.tupl._TreeCursor.previous(_TreeCursor.java:1122)
    at org.cojen.tupl._TreeCursor.findLe(_TreeCursor.java:1806)
    ... 37 common frames omitted
Caused by: java.nio.channels.ClosedByInterruptException: null
    at java.nio.channels.spi.AbstractInterruptibleChannel.end(Unknown Source)
    at sun.nio.ch.FileChannelImpl.readInternal(Unknown Source)
    at sun.nio.ch.FileChannelImpl.read(Unknown Source)
    at org.cojen.tupl.io.JavaFileIO.doRead(JavaFileIO.java:145)
    at org.cojen.tupl.io.AbstractFileIO.access(AbstractFileIO.java:311)
    at org.cojen.tupl.io.AbstractFileIO.access(AbstractFileIO.java:327)
    at org.cojen.tupl.io.AbstractFileIO.read(AbstractFileIO.java:164)
    at org.cojen.tupl.io.FilePageArray.readPage(FilePageArray.java:93)
    at org.cojen.tupl._SnapshotPageArray.readPage(_SnapshotPageArray.java:100)
    at org.cojen.tupl._DurablePageDb.readPage(_DurablePageDb.java:374)
    at org.cojen.tupl._LocalDatabase.readNode(_LocalDatabase.java:4166)
    at org.cojen.tupl._LocalDatabase.nodeMapLoadFragment(_LocalDatabase.java:2961)
    at org.cojen.tupl._LocalDatabase.reconstruct(_LocalDatabase.java:3877)
    at org.cojen.tupl._LocalDatabase.reconstruct(_LocalDatabase.java:3785)
    at org.cojen.tupl._Node.retrieveLeafValueAtLoc(_Node.java:1800)
    at org.cojen.tupl._Node.retrieveLeafValue(_Node.java:1775)
    at org.cojen.tupl._TreeCursor.tryCopyCurrent(_TreeCursor.java:1556)
    at org.cojen.tupl._TreeCursor.next(_TreeCursor.java:405)
    at org.cojen.tupl._TreeCursor.next(_TreeCursor.java:355)

Any advice would be much appreciated.

broneill commented 6 years ago

The only time Tupl will generate an interrupt is when the database is closed. The checkpointer thread is interrupted, which might be accessing the file at the time, causing it to be closed. In general, the interrupt isn't generated during a clean close, and so the root cause might be a panic of some kind. I assume no other errors were logged? Did the service start up ok? Any panic should have been logged by an event listener, but if not installed, check where stderr is logged.

broneill commented 6 years ago

You also might consider upgrading to the latest released version, 1.3.12.3. There have been quite a few fixes since 1.3.11 was released, almost a year ago.

MartinHaeusler commented 6 years ago

Thanks for the pointers, I appreciate your help. I'll try to get my hands on more log files; so far I only checked the exceptions which were caught and logged by our top-level exception handler, but having a look at the plain system.err stream output is a good idea. Frankly, I'm puzzled where this interrupt came from in the first place, and I tend to blame Tomcat (because it manages our threads). Yes the service started properly and it worked really well (read + write) for a long time now.

The error log above was produced by 1.3.11 because that was the latest version of TUPL at the time this particular instance was started. Unfortunately we cannot update because our project is incompatible with the AGPL license. Maven Central says 1.3.12.3. is Apache Licensed, but your GitHub repo says it's AGPL; which one of them is it?

broneill commented 6 years ago

The license changed with version 1.4, which hasn't been released yet.

broneill commented 2 years ago

https://github.com/cojen/Tupl/commit/62fc19c88e5fc11ad78f9aee428f408eda56081