dain / leveldb

Port of LevelDB to Java
Apache License 2.0
1.54k stars 429 forks source link

Database size limit? #99

Closed locosmac closed 5 years ago

locosmac commented 5 years ago

While experimenting with using level db 0.10 to process some large dataset, I ran across this exception:

Exception in thread "main" org.iq80.leveldb.impl.DbImpl$BackgroundProcessingException: java.lang.NullPointerException at org.iq80.leveldb.impl.DbImpl.checkBackgroundException(DbImpl.java:421) at org.iq80.leveldb.impl.DbImpl.writeInternal(DbImpl.java:683) at org.iq80.leveldb.impl.DbImpl.put(DbImpl.java:649) at org.iq80.leveldb.impl.DbImpl.put(DbImpl.java:642) at com.locoslab.library.osm.importer.LevelSink.process(LevelSink.java:151) at crosby.binary.osmosis.OsmosisBinaryParser.parseDense(OsmosisBinaryParser.java:138) at org.openstreetmap.osmosis.osmbinary.BinaryParser.parse(BinaryParser.java:124) at org.openstreetmap.osmosis.osmbinary.BinaryParser.handleBlock(BinaryParser.java:68) at org.openstreetmap.osmosis.osmbinary.file.FileBlock.process(FileBlock.java:135) at org.openstreetmap.osmosis.osmbinary.file.BlockInputStream.process(BlockInputStream.java:34) at crosby.binary.osmosis.OsmosisReader.run(OsmosisReader.java:45) at com.locoslab.library.osm.importer.App.main(App.java:27) Caused by: java.lang.NullPointerException at org.iq80.leveldb.impl.Compaction.totalFileSize(Compaction.java:129) at org.iq80.leveldb.impl.Compaction.isTrivialMove(Compaction.java:122) at org.iq80.leveldb.impl.DbImpl.backgroundCompaction(DbImpl.java:480) at org.iq80.leveldb.impl.DbImpl.backgroundCall(DbImpl.java:436) at org.iq80.leveldb.impl.DbImpl.access$100(DbImpl.java:85) at org.iq80.leveldb.impl.DbImpl$2.call(DbImpl.java:404) at org.iq80.leveldb.impl.DbImpl$2.call(DbImpl.java:398) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834)

It seems to be caused by some limitation that has to do with the size of the db. I am unsure whether it is the number of files or the size of the db itself. Furthermore, it seems quite possible that it is some limitation of the JVM that I am using.

Yet, I am sure that it has nothing to do with the number of entries. In my first test, I was able to store 2.8 billion entires. After using a more compact data representation, I was able to increase the number of entries to 3.8 billion. However, when the db reaches about 240GB, this exception is raised deterministically on my Windows 10 machine running OpenJDK11.

It is not a big problem for me, since I can easily partition my data to be stored in multiple level dbs. I just wanted to bring it to your attention. Thank you for providing this great tool.

hbs commented 5 years ago

We at SenX operate DB for Warp 10 with sizes in multiple Tb with several 100s of billion entries, so I would not think it is a size limit.

On Fri, Feb 1, 2019, 18:46 Marcus Handte <notifications@github.com wrote:

While experimenting with using level db 0.10 to process some large dataset, I ran across this exception:

Exception in thread "main" org.iq80.leveldb.impl.DbImpl$BackgroundProcessingException: java.lang.NullPointerException at org.iq80.leveldb.impl.DbImpl.checkBackgroundException(DbImpl.java:421) at org.iq80.leveldb.impl.DbImpl.writeInternal(DbImpl.java:683) at org.iq80.leveldb.impl.DbImpl.put(DbImpl.java:649) at org.iq80.leveldb.impl.DbImpl.put(DbImpl.java:642) at com.locoslab.library.osm.importer.LevelSink.process(LevelSink.java:151) at crosby.binary.osmosis.OsmosisBinaryParser.parseDense(OsmosisBinaryParser.java:138) at org.openstreetmap.osmosis.osmbinary.BinaryParser.parse(BinaryParser.java:124) at org.openstreetmap.osmosis.osmbinary.BinaryParser.handleBlock(BinaryParser.java:68) at org.openstreetmap.osmosis.osmbinary.file.FileBlock.process(FileBlock.java:135) at org.openstreetmap.osmosis.osmbinary.file.BlockInputStream.process(BlockInputStream.java:34) at crosby.binary.osmosis.OsmosisReader.run(OsmosisReader.java:45) at com.locoslab.library.osm.importer.App.main(App.java:27) Caused by: java.lang.NullPointerException at org.iq80.leveldb.impl.Compaction.totalFileSize(Compaction.java:129) at org.iq80.leveldb.impl.Compaction.isTrivialMove(Compaction.java:122) at org.iq80.leveldb.impl.DbImpl.backgroundCompaction(DbImpl.java:480) at org.iq80.leveldb.impl.DbImpl.backgroundCall(DbImpl.java:436) at org.iq80.leveldb.impl.DbImpl.access$100(DbImpl.java:85) at org.iq80.leveldb.impl.DbImpl$2.call(DbImpl.java:404) at org.iq80.leveldb.impl.DbImpl$2.call(DbImpl.java:398) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834)

It seems to be caused by some limitation that has to do with the size of the db. I am unsure whether it is the number of files or the size of the db itself. Furthermore, it seems quite possible that it is some limitation of the JVM that I am using.

Yet, I am sure that it has nothing to do with the number of entries. In my first test, I was able to store 2.8 billion entires. After using a more compact data representation, I was able to increase the number of entries to 3.8 billion. However, when the db reaches about 240GB, this exception is raised deterministically on my Windows 10 machine running OpenJDK11.

It is not a big problem for me, since I can easily partition my data to be stored in multiple level dbs. I just wanted to bring it to your attention. Thank you for providing this great tool.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dain/leveldb/issues/99, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOcMrFe_saGZxp6WPyXvZaRvhMDzbLcks5vJH2RgaJpZM4aex46 .

locosmac commented 5 years ago

Thank you for the feedback. However, I just hit the issue again. This time at 100GB with 4billion entries. I'll try it on another machine.

locosmac commented 5 years ago

Ok, I can reproduce the issue on a Linux machine with a different JDK.

locosmac commented 5 years ago

Here is a trivial example that demonstrates the problem:

    public static void main(String[] args) throws Exception {
        Logger logger = Logger.getLogger(StressTest.class.getName());
        DB db = Iq80DBFactory.factory.open(new File("C:/leveldb/test"), new Options());
        Random random = new Random(0);
        long counter = 0;
        while (true) {
            if (counter % 10000000 == 0) logger.log(Level.INFO, "Processed " + counter);
            ByteArrayOutputStream bos = new ByteArrayOutputStream(8);
            DataOutputStream dos = new DataOutputStream(bos);
            dos.writeLong(counter);
            byte[] value = new byte[16];
            random.nextBytes(value);
            db.put(bos.toByteArray(), value);
            counter += 1;
        }
    }

When I use this together with this Maven dependency of leveldb:

        <dependency>
            <groupId>org.iq80.leveldb</groupId>
            <artifactId>leveldb</artifactId>
            <version>0.10</version>
        </dependency>

Between 4 and 4.01 billion entires, the exception is raised deterministically:

Feb. 02, 2019 11:50:05 NACHM. com.locoslab.library.osm.importer.StressTest main
INFO: Processed 0
Feb. 02, 2019 11:50:14 NACHM. com.locoslab.library.osm.importer.StressTest main
INFO: Processed 10000000
...
Feb. 03, 2019 2:45:19 VORM. com.locoslab.library.osm.importer.StressTest main
INFO: Processed 3990000000
Feb. 03, 2019 2:46:11 VORM. com.locoslab.library.osm.importer.StressTest main
INFO: Processed 4000000000
Exception in thread "main" org.iq80.leveldb.impl.DbImpl$BackgroundProcessingException: java.lang.NullPointerException
    at org.iq80.leveldb.impl.DbImpl.checkBackgroundException(DbImpl.java:421)
    at org.iq80.leveldb.impl.DbImpl.writeInternal(DbImpl.java:683)
    at org.iq80.leveldb.impl.DbImpl.put(DbImpl.java:649)
    at org.iq80.leveldb.impl.DbImpl.put(DbImpl.java:642)
    at com.locoslab.library.osm.importer.StressTest.main(StressTest.java:29)
Caused by: java.lang.NullPointerException
    at org.iq80.leveldb.impl.Compaction.totalFileSize(Compaction.java:129)
    at org.iq80.leveldb.impl.Compaction.isTrivialMove(Compaction.java:122)
    at org.iq80.leveldb.impl.DbImpl.backgroundCompaction(DbImpl.java:480)
    at org.iq80.leveldb.impl.DbImpl.backgroundCall(DbImpl.java:436)
    at org.iq80.leveldb.impl.DbImpl.access$100(DbImpl.java:85)
    at org.iq80.leveldb.impl.DbImpl$2.call(DbImpl.java:404)
    at org.iq80.leveldb.impl.DbImpl$2.call(DbImpl.java:398)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834) 
dain commented 5 years ago

I haven't worked on this project in like 6 years, but this bug seems trivial to fix. When compacting the root level, there are no grandparents, so the list is null (bad design to have a nullable list). I put up a PR for this. If it looks good, let me know, and I merge it and do a release.

I would recommend looking for a fork that is actively maintained.

pcmind commented 5 years ago

was creating a fix at the sem time :). Different approche; by solving root cause. Grandparent collection should never be null. If you with I can create PR.

dain commented 5 years ago

I only looked at the code for a couple of minute, but I saw that shouldStopBefore checks if it is null. I'm fine making a bigger change, but please be very confident :)

pcmind commented 5 years ago

I am confident that It is correct and the null check is irrelevant after this fix. I'll remove it and submit a PR. Note: I am using a patched version for almost a year now without any issue.

dain commented 5 years ago

I published a snapshot: leveldb-0.11-20190219.234422-1.jar. Can you try it out and let me know if it works?

locosmac commented 5 years ago

Sure, I will give it a try, but it will take 2 or 3 hours or so.

locosmac commented 5 years ago

My artifactory does not find your snapshot, so I compiled the current master and ran my stress test. At the moment, my machine has written 6.85 billion entries (~185gb) without any issues. So it is way past the 4 billion entries that used to cause the NullPointerException. I think the problem is fixed. However, I'll keep the test running through the night and report back tomorrow moring.

locosmac commented 5 years ago

As far as I can tell, it is working. I stopped the process when I ran out of disk space with more than 10 billion entries. Thank you once again for making this library freely available. It is simple to use and very fast.

dain commented 5 years ago

@locosmac thanks for the verification. I'll run the release now.

locosmac commented 5 years ago

Excellent, thank you!