allegroai / clearml-server

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Other
364 stars 132 forks source link

Exception: Error starting server: failed connecting to ElasticSearch service #208

Open AhmadShaik opened 11 months ago

AhmadShaik commented 11 months ago

I am trying to run the clearml-server with one change from the default. Since my system is already using 8080, I changed that port to 9080 in the docker-compose.yml file. When I tried to run the application, I am getting server not available error in the UI. also in the terminal I am getting below error.

clearml-apiserver | Traceback (most recent call last):
clearml-apiserver |   File "/usr/local/lib/python3.9/runpy.py", line 197, in _run_module_as_main
clearml-apiserver |     return _run_code(code, main_globals, None,
clearml-apiserver |   File "/usr/local/lib/python3.9/runpy.py", line 87, in _run_code
clearml-apiserver |     exec(code, run_globals)
clearml-apiserver |   File "/opt/clearml/apiserver/server.py", line 10, in <module>
clearml-apiserver |     AppSequence(app).start(request_handlers=RequestHandlers())
clearml-apiserver |   File "/opt/clearml/apiserver/server_init/app_sequence.py", line 42, in start
clearml-apiserver |     self._init_dbs()
clearml-apiserver |   File "/opt/clearml/apiserver/server_init/app_sequence.py", line 101, in _init_dbs
clearml-apiserver |     raise Exception(
clearml-apiserver | Exception: Error starting server: failed connecting to ElasticSearch service

Here is my docker-compose.yml file.

version: "3.6"
services:

  apiserver:
    command:
    - apiserver
    container_name: clearml-apiserver
    image: allegroai/clearml:latest
    restart: unless-stopped
    volumes:
    - /opt/clearml/logs:/var/log/clearml
    - /opt/clearml/config:/opt/clearml/config
    - /opt/clearml/data/fileserver:/mnt/fileserver
    depends_on:
      - redis
      - mongo
      - elasticsearch
      - fileserver
    environment:
      CLEARML_ELASTIC_SERVICE_HOST: elasticsearch
      CLEARML_ELASTIC_SERVICE_PORT: 9200
      CLEARML_ELASTIC_SERVICE_PASSWORD: ${ELASTIC_PASSWORD}
      CLEARML_MONGODB_SERVICE_HOST: mongo
      CLEARML_MONGODB_SERVICE_PORT: 27017
      CLEARML_REDIS_SERVICE_HOST: redis
      CLEARML_REDIS_SERVICE_PORT: 6379
      CLEARML_SERVER_DEPLOYMENT_TYPE: ${CLEARML_SERVER_DEPLOYMENT_TYPE:-linux}
      CLEARML__apiserver__pre_populate__enabled: "true"
      CLEARML__apiserver__pre_populate__zip_files: "/opt/clearml/db-pre-populate"
      CLEARML__apiserver__pre_populate__artifacts_path: "/mnt/fileserver"
      CLEARML__services__async_urls_delete__enabled: "true"
      CLEARML__services__async_urls_delete__fileserver__url_prefixes: "[${CLEARML_FILES_HOST:-}]"
    ports:
    - "8008:8008"
    networks:
      - backend
      - frontend

  elasticsearch:
    networks:
      - backend
    container_name: clearml-elastic
    environment:
      ES_JAVA_OPTS: -Xms2g -Xmx2g -Dlog4j2.formatMsgNoLookups=true
      ELASTIC_PASSWORD: ${ELASTIC_PASSWORD}
      bootstrap.memory_lock: "true"
      cluster.name: clearml
      cluster.routing.allocation.node_initial_primaries_recoveries: "500"
      cluster.routing.allocation.disk.watermark.low: 500mb
      cluster.routing.allocation.disk.watermark.high: 500mb
      cluster.routing.allocation.disk.watermark.flood_stage: 500mb
      discovery.zen.minimum_master_nodes: "1"
      discovery.type: "single-node"
      http.compression_level: "7"
      node.ingest: "true"
      node.name: clearml
      reindex.remote.whitelist: '*.*'
      xpack.monitoring.enabled: "false"
      xpack.security.enabled: "false"
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    image: docker.elastic.co/elasticsearch/elasticsearch:7.17.7
    restart: unless-stopped
    volumes:
      - /opt/clearml/data/elastic_7:/usr/share/elasticsearch/data
      - /usr/share/elasticsearch/logs

  fileserver:
    networks:
      - backend
      - frontend
    command:
    - fileserver
    container_name: clearml-fileserver
    image: allegroai/clearml:latest
    environment:
      CLEARML__fileserver__delete__allow_batch: "true"
    restart: unless-stopped
    volumes:
    - /opt/clearml/logs:/var/log/clearml
    - /opt/clearml/data/fileserver:/mnt/fileserver
    - /opt/clearml/config:/opt/clearml/config
    ports:
    - "8081:8081"

  mongo:
    networks:
      - backend
    container_name: clearml-mongo
    image: mongo:4.4.9
    restart: unless-stopped
    command: --setParameter internalQueryMaxBlockingSortMemoryUsageBytes=196100200
    volumes:
    - /opt/clearml/data/mongo_4/db:/data/db
    - /opt/clearml/data/mongo_4/configdb:/data/configdb

  redis:
    networks:
      - backend
    container_name: clearml-redis
    image: redis:5.0
    restart: unless-stopped
    volumes:
    - /opt/clearml/data/redis:/data

  webserver:
    command:
    - webserver
    container_name: clearml-webserver
    # environment:
    #  CLEARML_SERVER_SUB_PATH : clearml-web # Allow Clearml to be served with a URL path prefix.
    image: allegroai/clearml:latest
    restart: unless-stopped
    depends_on:
      - apiserver
    ports:
    - "9080:80"
    networks:
      - backend
      - frontend

  async_delete:
    depends_on:
      - apiserver
      - redis
      - mongo
      - elasticsearch
      - fileserver
    container_name: async_delete
    image: allegroai/clearml:latest
    networks:
      - backend
    restart: unless-stopped
    environment:
      CLEARML_ELASTIC_SERVICE_HOST: elasticsearch
      CLEARML_ELASTIC_SERVICE_PORT: 9200
      CLEARML_ELASTIC_SERVICE_PASSWORD: ${ELASTIC_PASSWORD}
      CLEARML_MONGODB_SERVICE_HOST: mongo
      CLEARML_MONGODB_SERVICE_PORT: 27017
      CLEARML_REDIS_SERVICE_HOST: redis
      CLEARML_REDIS_SERVICE_PORT: 6379
      PYTHONPATH: /opt/clearml/apiserver
      CLEARML__services__async_urls_delete__fileserver__url_prefixes: "[${CLEARML_FILES_HOST:-}]"
    entrypoint:
      - python3
      - -m
      - jobs.async_urls_delete
      - --fileserver-host
      - http://fileserver:8081
    volumes:
      - /opt/clearml/logs:/var/log/clearml

  agent-services:
    networks:
      - backend
    container_name: clearml-agent-services
    image: allegroai/clearml-agent-services:latest
    deploy:
      restart_policy:
        condition: on-failure
    privileged: true
    environment:
      CLEARML_HOST_IP: ${CLEARML_HOST_IP}
      CLEARML_WEB_HOST: ${CLEARML_WEB_HOST:-}
      CLEARML_API_HOST: http://apiserver:8008
      CLEARML_FILES_HOST: ${CLEARML_FILES_HOST:-}
      CLEARML_API_ACCESS_KEY: ${CLEARML_API_ACCESS_KEY:-}
      CLEARML_API_SECRET_KEY: ${CLEARML_API_SECRET_KEY:-}
      CLEARML_AGENT_GIT_USER: ${CLEARML_AGENT_GIT_USER}
      CLEARML_AGENT_GIT_PASS: ${CLEARML_AGENT_GIT_PASS}
      CLEARML_AGENT_UPDATE_VERSION: ${CLEARML_AGENT_UPDATE_VERSION:->=0.17.0}
      CLEARML_AGENT_DEFAULT_BASE_DOCKER: "ubuntu:18.04"
      AWS_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID:-}
      AWS_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY:-}
      AWS_DEFAULT_REGION: ${AWS_DEFAULT_REGION:-}
      AZURE_STORAGE_ACCOUNT: ${AZURE_STORAGE_ACCOUNT:-}
      AZURE_STORAGE_KEY: ${AZURE_STORAGE_KEY:-}
      GOOGLE_APPLICATION_CREDENTIALS: ${GOOGLE_APPLICATION_CREDENTIALS:-}
      CLEARML_WORKER_ID: "clearml-services"
      CLEARML_AGENT_DOCKER_HOST_MOUNT: "/opt/clearml/agent:/root/.clearml"
      SHUTDOWN_IF_NO_ACCESS_KEY: 1
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /opt/clearml/agent:/root/.clearml
    depends_on:
      - apiserver
    entrypoint: >
      bash -c "curl --retry 10 --retry-delay 10 --retry-connrefused 'http://apiserver:8008/debug.ping' && /usr/agent/entrypoint.sh"

networks:
  backend:
    driver: bridge
  frontend:
    driver: bridge

Here is the docker log from clearml-elasticsearch

docker logs clearml-elastic
{"type": "server", "timestamp": "2023-07-18T04:33:27,094Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "clearml", "node.name": "clearml", "message": "version[7.17.7], pid[6], build[default/docker/78dcaaa8cee33438b91eca7f5c7f56a70fec9e80/2022-10-17T15:29:54.167373105Z], OS[Linux/5.15.0-71-generic/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/19/19+36-2238]" }
{"type": "server", "timestamp": "2023-07-18T04:33:27,098Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "clearml", "node.name": "clearml", "message": "JVM home [/usr/share/elasticsearch/jdk], using bundled JDK [true]" }
{"type": "server", "timestamp": "2023-07-18T04:33:27,098Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "clearml", "node.name": "clearml", "message": "JVM arguments [-Xshare:auto, -Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -XX:+ShowCodeDetailsInExceptionMessages, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dio.netty.allocator.numDirectArenas=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j2.formatMsgNoLookups=true, -Djava.locale.providers=SPI,COMPAT, --add-opens=java.base/java.io=ALL-UNNAMED, -Djava.security.manager=allow, -XX:+UseG1GC, -Djava.io.tmpdir=/tmp/elasticsearch-7602434488233012655, -XX:+HeapDumpOnOutOfMemoryError, -XX:+ExitOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=logs/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -Des.cgroups.hierarchy.override=/, -Xms2g, -Xmx2g, -Dlog4j2.formatMsgNoLookups=true, -XX:MaxDirectMemorySize=1073741824, -XX:G1HeapRegionSize=4m, -XX:InitiatingHeapOccupancyPercent=30, -XX:G1ReservePercent=15, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/usr/share/elasticsearch/config, -Des.distribution.flavor=default, -Des.distribution.type=docker, -Des.bundled_jdk=true]" }
{"type": "server", "timestamp": "2023-07-18T04:34:39,890Z", "level": "INFO", "component": "o.e.p.PluginsService", "cluster.name": "clearml", "node.name": "clearml", "message": "loaded module [x-pack-sql]" }
{"type": "server", "timestamp": "2023-07-18T04:34:39,890Z", "level": "INFO", "component": "o.e.p.PluginsService", "cluster.name": "clearml", "node.name": "clearml", "message": "loaded module [x-pack-stack]" }
{"type": "server", "timestamp": "2023-07-18T04:34:39,890Z", "level": "INFO", "component": "o.e.p.PluginsService", "cluster.name": "clearml", "node.name": "clearml", "message": "loaded module [x-pack-text-structure]" }
{"type": "server", "timestamp": "2023-07-18T04:34:39,890Z", "level": "INFO", "component": "o.e.p.PluginsService", "cluster.name": "clearml", "node.name": "clearml", "message": "loaded module [x-pack-voting-only-node]" }
{"type": "server", "timestamp": "2023-07-18T04:34:39,890Z", "level": "INFO", "component": "o.e.p.PluginsService", "cluster.name": "clearml", "node.name": "clearml", "message": "loaded module [x-pack-watcher]" }
{"type": "server", "timestamp": "2023-07-18T04:34:39,891Z", "level": "INFO", "component": "o.e.p.PluginsService", "cluster.name": "clearml", "node.name": "clearml", "message": "no plugins loaded" }
{"type": "server", "timestamp": "2023-07-18T04:34:39,967Z", "level": "ERROR", "component": "o.e.b.ElasticsearchUncaughtExceptionHandler", "cluster.name": "clearml", "node.name": "clearml", "message": "uncaught exception in thread [main]", 
"stacktrace": ["org.elasticsearch.bootstrap.StartupException: ElasticsearchException[failed to bind service]; nested: AccessDeniedException[/usr/share/elasticsearch/data/nodes];",
"at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:173) ~[elasticsearch-7.17.7.jar:7.17.7]",
"at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:160) ~[elasticsearch-7.17.7.jar:7.17.7]",
"at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:77) ~[elasticsearch-7.17.7.jar:7.17.7]",
"at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:112) ~[elasticsearch-cli-7.17.7.jar:7.17.7]",
"at org.elasticsearch.cli.Command.main(Command.java:77) ~[elasticsearch-cli-7.17.7.jar:7.17.7]",
"at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:125) ~[elasticsearch-7.17.7.jar:7.17.7]",
"at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:80) ~[elasticsearch-7.17.7.jar:7.17.7]",
"Caused by: org.elasticsearch.ElasticsearchException: failed to bind service",
"at org.elasticsearch.node.Node.<init>(Node.java:1088) ~[elasticsearch-7.17.7.jar:7.17.7]",
"at org.elasticsearch.node.Node.<init>(Node.java:309) ~[elasticsearch-7.17.7.jar:7.17.7]",
"at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:234) ~[elasticsearch-7.17.7.jar:7.17.7]",
"at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:234) ~[elasticsearch-7.17.7.jar:7.17.7]",
"at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:434) ~[elasticsearch-7.17.7.jar:7.17.7]",
"at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:169) ~[elasticsearch-7.17.7.jar:7.17.7]",
"... 6 more",
"Caused by: java.nio.file.AccessDeniedException: /usr/share/elasticsearch/data/nodes",
"at sun.nio.fs.UnixException.translateToIOException(UnixException.java:90) ~[?:?]",
"at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) ~[?:?]",
"at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]",
"at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:397) ~[?:?]",
"at java.nio.file.Files.createDirectory(Files.java:700) ~[?:?]",
"at java.nio.file.Files.createAndCheckIsDirectory(Files.java:807) ~[?:?]",
"at java.nio.file.Files.createDirectories(Files.java:793) ~[?:?]",
"at org.elasticsearch.env.NodeEnvironment.lambda$new$0(NodeEnvironment.java:300) ~[elasticsearch-7.17.7.jar:7.17.7]",
"at org.elasticsearch.env.NodeEnvironment$NodeLock.<init>(NodeEnvironment.java:224) ~[elasticsearch-7.17.7.jar:7.17.7]",
"at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:298) ~[elasticsearch-7.17.7.jar:7.17.7]",
"at org.elasticsearch.node.Node.<init>(Node.java:429) ~[elasticsearch-7.17.7.jar:7.17.7]",
"at org.elasticsearch.node.Node.<init>(Node.java:309) ~[elasticsearch-7.17.7.jar:7.17.7]",
"at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:234) ~[elasticsearch-7.17.7.jar:7.17.7]",
"at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:234) ~[elasticsearch-7.17.7.jar:7.17.7]",
"at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:434) ~[elasticsearch-7.17.7.jar:7.17.7]",
"at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:169) ~[elasticsearch-7.17.7.jar:7.17.7]",
"... 6 more"] }
uncaught exception in thread [main]
ElasticsearchException[failed to bind service]; nested: AccessDeniedException[/usr/share/elasticsearch/data/nodes];
Likely root cause: java.nio.file.AccessDeniedException: /usr/share/elasticsearch/data/nodes
    at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:90)
    at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
    at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
    at java.base/sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:397)
    at java.base/java.nio.file.Files.createDirectory(Files.java:700)
    at java.base/java.nio.file.Files.createAndCheckIsDirectory(Files.java:807)
    at java.base/java.nio.file.Files.createDirectories(Files.java:793)
    at org.elasticsearch.env.NodeEnvironment.lambda$new$0(NodeEnvironment.java:300)
    at org.elasticsearch.env.NodeEnvironment$NodeLock.<init>(NodeEnvironment.java:224)
    at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:298)
    at org.elasticsearch.node.Node.<init>(Node.java:429)
    at org.elasticsearch.node.Node.<init>(Node.java:309)
    at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:234)
    at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:234)
    at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:434)
    at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:169)
    at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:160)
    at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:77)
    at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:112)
    at org.elasticsearch.cli.Command.main(Command.java:77)
    at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:125)
    at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:80)
For complete error details, refer to the log at /usr/share/elasticsearch/logs/clearml.log
{"type": "server", "timestamp": "2023-07-18T04:34:39,978Z", "level": "ERROR", "component": "o.e.b.ElasticsearchUncaughtExceptionHandler", "cluster.name": "${sys:es.logs.cluster_name}", "node.name": "clearml", "message": "uncaught exception in thread [process reaper (pid 221)]", 
"stacktrace": ["java.security.AccessControlException: access denied (\"java.lang.RuntimePermission\" \"modifyThread\")",
"at java.security.AccessControlContext.checkPermission(AccessControlContext.java:485) ~[?:?]",
"at java.security.AccessController.checkPermission(AccessController.java:1068) ~[?:?]",
"at java.lang.SecurityManager.checkPermission(SecurityManager.java:411) ~[?:?]",
"at org.elasticsearch.secure_sm.SecureSM.checkThreadAccess(SecureSM.java:160) ~[?:7.17.7]",
"at org.elasticsearch.secure_sm.SecureSM.checkAccess(SecureSM.java:120) ~[?:7.17.7]",
"at java.lang.Thread.checkAccess(Thread.java:2360) ~[?:?]",
"at java.lang.Thread.setDaemon(Thread.java:2308) ~[?:?]",
"at java.lang.ProcessHandleImpl.lambda$static$0(ProcessHandleImpl.java:103) ~[?:?]",
"at java.util.concurrent.ThreadPoolExecutor$Worker.<init>(ThreadPoolExecutor.java:637) ~[?:?]",
"at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:928) ~[?:?]",
"at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1021) ~[?:?]",
"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1158) ~[?:?]",
"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]",
"at java.lang.Thread.run(Thread.java:1589) [?:?]",
"at jdk.internal.misc.InnocuousThread.run(InnocuousThread.java:186) ~[?:?]"] }
uncaught exception in thread [process reaper (pid 221)]
java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "modifyThread")
    at java.base/java.security.AccessControlContext.checkPermission(AccessControlContext.java:485)
    at java.base/java.security.AccessController.checkPermission(AccessController.java:1068)
    at java.base/java.lang.SecurityManager.checkPermission(SecurityManager.java:411)
    at org.elasticsearch.secure_sm.SecureSM.checkThreadAccess(SecureSM.java:160)
    at org.elasticsearch.secure_sm.SecureSM.checkAccess(SecureSM.java:120)
    at java.base/java.lang.Thread.checkAccess(Thread.java:2360)
    at java.base/java.lang.Thread.setDaemon(Thread.java:2308)
    at java.base/java.lang.ProcessHandleImpl.lambda$static$0(ProcessHandleImpl.java:103)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.<init>(ThreadPoolExecutor.java:637)
    at java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:928)
    at java.base/java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1021)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1158)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1589)

The system has 25 GB Ram and 2.5 TB free storage.

Hoe can I solve this error.

jkhenning commented 11 months ago

Hi @AhmadShaik, The error you're seeing is not related to the port you changed, but rather related to the fact the ES cannot start for some reason. I believe this is due to the data folder not having the right permissions, specifically the ES data folder should be owned by 1000:1000 (as explained in step 9 here)

niemiaszek commented 10 months ago

@jkhenning Im facing the same problem. We have docker group as 999 instead of 1000, so I swapped guid with 999. I couldn't find docker user in /etc/passwd and 1000 is some organization admin user. ATM I have root:docker configuration. Should I have docker user configured?

Note that all works when data folder has +w permissions for all users

jkhenning commented 9 months ago

In general the issue of group 1000 is not something that can easily be changed (unrelated to clearml in this case) since as far as I know docker/ES depends on it

a1xs commented 5 months ago

I have a similar problem.

jkhenning commented 5 months ago

@a1xs as far as I know this is an ES issue when run inside docker

a1xs commented 5 months ago

@a1xs as far as I know this is an ES issue when run inside docker

Yes, this is a problem with elasticserch. I returned to the old working version.

a1xs commented 5 months ago

I change the path from /usr/share/elasticsearch/data to /var/lib/elasticsearch/data and it works!