DataFabricRus / textfile-utils

A simple JVM library with utilitarian methods for working with text files of any size, including merge sorting and binary search. The library is based on the Java NIO and Kotlin coroutines.
Apache License 2.0
3 stars 1 forks source link

OOM when using sort on 21 MB file #7

Closed so-dewy closed 6 months ago

so-dewy commented 6 months ago

Reproducer test can be found in a commit here https://github.com/so-dewy/textfile-utils

How to reproduce

  1. In src/test/kotlin/MergeSortTest.kt add a new test:
    @Test
    fun `test sort oom`(@TempDir dir: Path) {
        testDefaultSortResourceFile(dir, "/oom.nt")
    }
  2. In /src/test/resources add file oom.nt.tar.gz
  3. Run test from 1

Stacktrace:

org.gradle.api.internal.tasks.testing.TestSuiteExecutionException: Could not complete execution for Gradle Test Executor 5.
    at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.stop(SuiteTestClassProcessor.java:64)
    at java.base@11.0.18/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base@11.0.18/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base@11.0.18/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base@11.0.18/java.lang.reflect.Method.invoke(Method.java:566)
    at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
    at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
    at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
    at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
    at com.sun.proxy.$Proxy2.stop(Unknown Source)
    at org.gradle.api.internal.tasks.testing.worker.TestWorker$3.run(TestWorker.java:193)
    at org.gradle.api.internal.tasks.testing.worker.TestWorker.executeAndMaintainThreadName(TestWorker.java:129)
    at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:100)
    at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:60)
    at org.gradle.process.internal.worker.child.ActionExecutionWorker.execute(ActionExecutionWorker.java:56)
    at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:113)
    at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:65)
    at app//worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:69)
    at app//worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74)
Caused by: java.lang.OutOfMemoryError: Java heap space
    at java.base/java.util.concurrent.ArrayBlockingQueue.<init>(ArrayBlockingQueue.java:270)
    at java.base/java.util.concurrent.ArrayBlockingQueue.<init>(ArrayBlockingQueue.java:254)
    at cc.datafabric.textfileutils.files.AsyncLineReader.<init>(LineReader.kt:325)
    at cc.datafabric.textfileutils.files.LineReaderKt.asyncReadByteLines(LineReader.kt:204)
    at cc.datafabric.textfileutils.files.LineReaderKt.readLines(LineReader.kt:148)
    at cc.datafabric.textfileutils.files.LineReaderKt.readLines$default(LineReader.kt:97)
    at cc.datafabric.textfileutils.files.FileMergeKt$mergeFilesInverse$10$1.invoke(FileMerge.kt:234)
    at cc.datafabric.textfileutils.files.FileMergeKt$mergeFilesInverse$10$1.invoke(FileMerge.kt:228)
    at cc.datafabric.textfileutils.files.FilesKt.use(Files.kt:245)
    at cc.datafabric.textfileutils.files.FilesKt.use$default(Files.kt:236)
    at cc.datafabric.textfileutils.files.FileMergeKt$mergeFilesInverse$10.invoke(FileMerge.kt:228)
    at cc.datafabric.textfileutils.files.FileMergeKt$mergeFilesInverse$10.invoke(FileMerge.kt:227)
    at cc.datafabric.textfileutils.files.FilesKt.use(Files.kt:229)
    at cc.datafabric.textfileutils.files.FilesKt.use$default(Files.kt:223)
    at cc.datafabric.textfileutils.files.FileMergeKt.mergeFilesInverse(FileMerge.kt:227)
    at cc.datafabric.textfileutils.files.FileMergeKt.mergeFilesInverse(FileMerge.kt:126)
    at cc.datafabric.textfileutils.files.FileMergeKt.mergeFilesInverse$default(FileMerge.kt:100)
    at cc.datafabric.textfileutils.files.MergeSortKt.suspendSort(MergeSort.kt:304)
    at cc.datafabric.textfileutils.files.MergeSortKt.suspendSort(MergeSort.kt:208)
    at cc.datafabric.textfileutils.files.MergeSortKt$blockingSort$1.invokeSuspend(MergeSort.kt:118)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:108)
    at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:115)
    at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:103)
    at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:793)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:697)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:684)
sszuev commented 6 months ago

This problem is fixed. But the root of the problem is that there are too many file descriptors being opened. Merging is performed in parallel.

So, the correct solution is to introduce a new control parameter (how many file channels could be open) and make the merge partially sequential.