Closed liabozarth closed 10 months ago
This issue is stale because it has been open for 40 days with no activity.
This issue was closed because it has been inactive for 20 days since being marked as stale.
Facing this issue as well when trying to scan a large amount of documents (100k+).
Describe the bug I have 2K+ images that I'm trying to upload into datashare (server mode). I used
docker compose exec datashare_web /entrypoint.sh \ --mode CLI \ --stage SCAN,INDEX \ --defaultProject secret-project \ --elasticsearchAddress http://elasticsearch:9200 \ --dataDir /home/datashare/Datashare/
The process errored out at around 600 images. I then tried to run the 2-step process
docker compose exec datashare_web /entrypoint.sh \ --mode CLI \ --stage SCANIDX \ --queueType REDIS \ --reportName "report:queue" \ --redisAddress redis://redis:6379 \ --defaultProject secret-project \ --elasticsearchAddress http://elasticsearch:9200 \ --dataDir /home/datashare/Datashare/
The error stack:
2023-11-15 19:20:06,497 [main] INFO Indexer - indexer defined with cfg{indexJoinField='join', docTypeField='type', shards=1, replicas=1} 2023-11-15 19:20:06,501 [main] INFO CliApp - found 0 CLI extension(s) 2023-11-15 19:20:06,539 [redisson-netty-5-5] INFO MasterConnectionPool - 1 connections initialized for redis/172.26.0.3:6379 2023-11-15 19:20:06,540 [redisson-netty-5-6] INFO MasterPubSubConnectionPool - 1 connections initialized for redis/172.26.0.3:6379 2023-11-15 19:20:06,550 [pool-3-thread-1] INFO ScanIndexTask - scanning index secret-project with scroll size 1000 and 1 slices 2023-11-15 19:20:07,540 [pool-3-thread-1] INFO ScanIndexTask - imported 690 paths into org.icij.datashare.extract.RedisUserReportMap@e4981494 2023-11-15 19:20:07,710 [main] INFO CliApp - scanned 690 2023-11-15 19:20:07,712 [main] INFO Indexer - Closing Elasticsearch connections 2023-11-15 19:20:07,715 [main] INFO Indexer - Elasticsearch connections closed 2023-11-15 19:20:07,715 [main] INFO Main - exiting main 2023-11-16 02:37:23,947 [redisson-netty-2-4] ERROR ResourceLeakDetector - LEAK: ByteBuf.release() was not called before it's garbage-collected. See https://netty.io/wiki/reference-counted-objects.html for more information. Recent access records: Created at: io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:403) io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:188) io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:174) io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:108) org.icij.extract.redis.ResultEncoder.encode(ResultEncoder.java:22) org.redisson.command.CommandAsyncService.encodeMapValue(CommandAsyncService.java:654) org.redisson.RedissonObject.encodeMapValue(RedissonObject.java:344) org.redisson.RedissonMap.encodeMapKeys(RedissonMap.java:1048) org.redisson.RedissonMap.putAllOperationAsync(RedissonMap.java:772) org.redisson.RedissonMap.putAllAsync(RedissonMap.java:713) org.redisson.RedissonMap.putAll(RedissonMap.java:666) org.icij.datashare.tasks.ScanIndexTask.slicedScroll(ScanIndexTask.java:66) java.base/java.util.stream.IntPipeline$1$1.accept(Unknown Source) java.base/java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Unknown Source) java.base/java.util.Spliterator$OfInt.forEachRemaining(Unknown Source) java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source) java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source) java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(Unknown Source) java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(Unknown Source) java.base/java.util.stream.AbstractTask.compute(Unknown Source) java.base/java.util.concurrent.CountedCompleter.exec(Unknown Source) java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source) java.base/java.util.concurrent.ForkJoinTask.doInvoke(Unknown Source) java.base/java.util.concurrent.ForkJoinTask.invoke(Unknown Source) java.base/java.util.stream.ReduceOps$ReduceOp.evaluateParallel(Unknown Source) java.base/java.util.stream.AbstractPipeline.evaluate(Unknown Source) java.base/java.util.stream.ReferencePipeline.reduce(Unknown Source) org.icij.datashare.tasks.ScanIndexTask.call(ScanIndexTask.java:53) org.icij.datashare.tasks.ScanIndexTask.call(ScanIndexTask.java:30) java.base/java.util.concurrent.FutureTask.run(Unknown Source) java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) java.base/java.util.concurrent.FutureTask.run(Unknown Source) java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) java.base/java.lang.Thread.run(Unknown Source)
To Reproduce described above
my docker compose file is the following. I'm using 13.5.0 `version: "3.7" services: datashare_web: image: icij/datashare:13.5.0 hostname: datashare ports:
type: bind source: /data/datashare/datadir target: /home/datashare/Datashare depends_on: postgresql: condition: service_healthy redis: condition: service_healthy elasticsearch: condition: service_healthy command: >- --mode SERVER --dataDir /home/datashare/Datashare --authFilter org.icij.datashare.session.YesCookieAuthFilter --busType REDIS --batchQueueType REDIS --dataSourceUrl jdbc:postgresql://postgresql/datashare?user=datashare\&password=password --defaultProject secret-project --elasticsearchAddress http://elasticsearch:9200 --messageBusAddress redis://redis:6379 --queueType REDIS --redisAddress redis://redis:6379 --rootHost http://localhost:8080 --sessionStoreType REDIS
--authFilter org.icij.datashare.session.BasicAuthAdaptorFilter
--authUsersProvider org.icij.datashare.session.UsersInDb
--sessionTtlSeconds 43200 --tcpListenPort 8080
datashare_create_project: image: icij/datashare:13.5.0 restart: no depends_on: elasticsearch: condition: service_healthy command: >- --defaultProject secret-project --mode CLI --stage INDEX --elasticsearchAddress http://elasticsearch:9200
datashare_batch_searches: image: icij/datashare:13.5.0 depends_on:
datashare_web command: >- --mode BATCH_SEARCH --batchQueueType REDIS --batchThrottleMilliseconds 500 --busType REDIS --dataSourceUrl jdbc:postgresql://postgresql/datashare?user=datashare\&password=password --defaultProject secret-project --elasticsearchAddress http://elasticsearch:9200 --queueType REDIS --redisAddress redis://redis:6379 --scrollSize 100
datashare_batch_downloads: image: icij/datashare:13.5.0 depends_on:
type: volume source: datashare-batchdownload-dir target: /home/datashare/app/tmp read_only: false command: >- --mode BATCH_DOWNLOAD --dataDir /home/datashare/Datashare --batchDownloadTimeToLive 336 --batchQueueType REDIS --batchThrottleMilliseconds 500 --busType REDIS --dataSourceUrl jdbc:postgresql://postgresql/datashare?user=datashare\&password=password --defaultProject secret-project --elasticsearchAddress http://elasticsearch:9200 --queueType REDIS --redisAddress redis://redis:6379 --scrollSize 100
elasticsearch: image: docker.elastic.co/elasticsearch/elasticsearch:7.9.1 restart: on-failure volumes:
"http.cors.allow-methods=OPTIONS, HEAD, GET, POST, PUT, DELETE" healthcheck: test: ["CMD-SHELL", "curl --silent --fail elasticsearch:9200/_cluster/health || exit 1"]
postgresql: image: postgres:12-alpine environment:
type: volume source: postgresql-data target: /var/lib/postgresql/data healthcheck: test: ["CMD-SHELL", "pg_isready", "-U", "datashare", "-d", "datashare"]
redis: image: redis:4.0.1-alpine restart: on-failure volumes:
volumes: datashare-batchdownload-dir: elasticsearch-data: postgresql-data: redis-data: `
Expected behavior A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Additional context Add any other context about the problem here.
(Optional) Your contact, availabilities and timezone if a video call with screensharing is needed For any private information, please consider sending an email to datashare@icij.org.