ICIJ / datashare

A self-hosted search engine for documents.
https://datashare.icij.org
GNU Affero General Public License v3.0
596 stars 53 forks source link

Issue uploading image documents in server mode (ResourceLeakDetector) #1261

Closed liabozarth closed 10 months ago

liabozarth commented 12 months ago

Describe the bug I have 2K+ images that I'm trying to upload into datashare (server mode). I used

docker compose exec datashare_web /entrypoint.sh \ --mode CLI \ --stage SCAN,INDEX \ --defaultProject secret-project \ --elasticsearchAddress http://elasticsearch:9200 \ --dataDir /home/datashare/Datashare/

The process errored out at around 600 images. I then tried to run the 2-step process docker compose exec datashare_web /entrypoint.sh \ --mode CLI \ --stage SCANIDX \ --queueType REDIS \ --reportName "report:queue" \ --redisAddress redis://redis:6379 \ --defaultProject secret-project \ --elasticsearchAddress http://elasticsearch:9200 \ --dataDir /home/datashare/Datashare/

The error stack:

2023-11-15 19:20:06,497 [main] INFO Indexer - indexer defined with cfg{indexJoinField='join', docTypeField='type', shards=1, replicas=1} 2023-11-15 19:20:06,501 [main] INFO CliApp - found 0 CLI extension(s) 2023-11-15 19:20:06,539 [redisson-netty-5-5] INFO MasterConnectionPool - 1 connections initialized for redis/172.26.0.3:6379 2023-11-15 19:20:06,540 [redisson-netty-5-6] INFO MasterPubSubConnectionPool - 1 connections initialized for redis/172.26.0.3:6379 2023-11-15 19:20:06,550 [pool-3-thread-1] INFO ScanIndexTask - scanning index secret-project with scroll size 1000 and 1 slices 2023-11-15 19:20:07,540 [pool-3-thread-1] INFO ScanIndexTask - imported 690 paths into org.icij.datashare.extract.RedisUserReportMap@e4981494 2023-11-15 19:20:07,710 [main] INFO CliApp - scanned 690 2023-11-15 19:20:07,712 [main] INFO Indexer - Closing Elasticsearch connections 2023-11-15 19:20:07,715 [main] INFO Indexer - Elasticsearch connections closed 2023-11-15 19:20:07,715 [main] INFO Main - exiting main 2023-11-16 02:37:23,947 [redisson-netty-2-4] ERROR ResourceLeakDetector - LEAK: ByteBuf.release() was not called before it's garbage-collected. See https://netty.io/wiki/reference-counted-objects.html for more information. Recent access records: Created at: io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:403) io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:188) io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:174) io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:108) org.icij.extract.redis.ResultEncoder.encode(ResultEncoder.java:22) org.redisson.command.CommandAsyncService.encodeMapValue(CommandAsyncService.java:654) org.redisson.RedissonObject.encodeMapValue(RedissonObject.java:344) org.redisson.RedissonMap.encodeMapKeys(RedissonMap.java:1048) org.redisson.RedissonMap.putAllOperationAsync(RedissonMap.java:772) org.redisson.RedissonMap.putAllAsync(RedissonMap.java:713) org.redisson.RedissonMap.putAll(RedissonMap.java:666) org.icij.datashare.tasks.ScanIndexTask.slicedScroll(ScanIndexTask.java:66) java.base/java.util.stream.IntPipeline$1$1.accept(Unknown Source) java.base/java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Unknown Source) java.base/java.util.Spliterator$OfInt.forEachRemaining(Unknown Source) java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source) java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source) java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(Unknown Source) java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(Unknown Source) java.base/java.util.stream.AbstractTask.compute(Unknown Source) java.base/java.util.concurrent.CountedCompleter.exec(Unknown Source) java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source) java.base/java.util.concurrent.ForkJoinTask.doInvoke(Unknown Source) java.base/java.util.concurrent.ForkJoinTask.invoke(Unknown Source) java.base/java.util.stream.ReduceOps$ReduceOp.evaluateParallel(Unknown Source) java.base/java.util.stream.AbstractPipeline.evaluate(Unknown Source) java.base/java.util.stream.ReferencePipeline.reduce(Unknown Source) org.icij.datashare.tasks.ScanIndexTask.call(ScanIndexTask.java:53) org.icij.datashare.tasks.ScanIndexTask.call(ScanIndexTask.java:30) java.base/java.util.concurrent.FutureTask.run(Unknown Source) java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) java.base/java.util.concurrent.FutureTask.run(Unknown Source) java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) java.base/java.lang.Thread.run(Unknown Source)

To Reproduce described above

my docker compose file is the following. I'm using 13.5.0 `version: "3.7" services: datashare_web: image: icij/datashare:13.5.0 hostname: datashare ports:

volumes: datashare-batchdownload-dir: elasticsearch-data: postgresql-data: redis-data: `

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

(Optional) Your contact, availabilities and timezone if a video call with screensharing is needed For any private information, please consider sending an email to datashare@icij.org.

github-actions[bot] commented 10 months ago

This issue is stale because it has been open for 40 days with no activity.

github-actions[bot] commented 10 months ago

This issue was closed because it has been inactive for 20 days since being marked as stale.

neoReuters commented 7 months ago

Facing this issue as well when trying to scan a large amount of documents (100k+).