elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.07k stars 24.84k forks source link

Elasticsearch fails to start in Docker, when `elasticsearch.yml` is bind mount #85463

Open jkakavas opened 2 years ago

jkakavas commented 2 years ago

Elasticsearch Version

8.0.0

Installed Plugins

No response

Java Version

bundled

OS Version

N/A

Problem Description

Elasticsearch fails to start when elasticsearch.yml is bind mount to a file on the host with a "Device or resource busy' error. This was possibly introduced with the changes for the autoconfiguration of the security features and triggers when we attempt to write the configuration to the elasticsearch.yml file (AutoConfigureNode#fullyWriteFile)

Steps to Reproduce

docker run --name oh-noes-this-fails -p 9200:9200 -v /absolute/path/to/a/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml -it docker.elastic.co/elasticsearch/elasticsearch:8.0.0

or

docker run --name  oh-noes-this-fails-too -p 9200:9200 --mount type=bind,source=/absolute/path/to/a/elasticsearch.yml,target=/usr/share/elasticsearch/config/elasticsearch.yml -it docker.elastic.co/elasticsearch/elasticsearch:8.0.0

fails with

Exception in thread "main" java.nio.file.FileSystemException: /usr/share/elasticsearch/config/elasticsearch.yml.R0_9BZ4hRx-v8zK3F0U-Bw.tmp -> /usr/share/elasticsearch/config/elasticsearch.yml: Device or resource busy
    at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:100)
    at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
    at java.base/sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:416)
    at java.base/sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:267)
    at java.base/java.nio.file.Files.move(Files.java:1432)
    at org.elasticsearch.xpack.security.cli.AutoConfigureNode.fullyWriteFile(AutoConfigureNode.java:1136)
    at org.elasticsearch.xpack.security.cli.AutoConfigureNode.fullyWriteFile(AutoConfigureNode.java:1148)
    at org.elasticsearch.xpack.security.cli.AutoConfigureNode.execute(AutoConfigureNode.java:687)
    at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:77)
    at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:112)
    at org.elasticsearch.cli.Command.main(Command.java:77)
    at org.elasticsearch.xpack.security.cli.AutoConfigureNode.main(AutoConfigureNode.java:157)

Logs (if relevant)

No response

elasticmachine commented 2 years ago

Pinging @elastic/es-security (Team:Security)

justincr-elastic commented 2 years ago

Clarification: From the stack trace, AutoConfigureNode CLI is experiencing the error, not Elasticsearch.

Startup: Container => /usr/local/bin/docker-entrypoint.sh => /usr/share/elasticsearch/bin/elasticsearch

Looking at /usr/share/elasticsearch/bin/elasticsearch, it seems like the variable ATTEMPT_SECURITY_AUTO_CONFIG=true triggers a call to AutoConfigureNode CLI before Elasticsearch. The stack trace is for AutoConfigureNode CLI, not Elasticsearch.

Excerpt of the AutoConfigure CLI command:

ES_MAIN_CLASS=org.elasticsearch.xpack.security.cli.AutoConfigureNode \
ES_ADDITIONAL_SOURCES="x-pack-env;x-pack-security-env" \
ES_ADDITIONAL_CLASSPATH_DIRECTORIES=lib/tools/security-cli \
bin/elasticsearch-cli "${ARG_LIST[@]}" <<<"$KEYSTORE_PASSWORD"

Excerpt of the Elasticsearch daemon command:

    "$JAVA" \
    "$XSHARE" \
    $ES_JAVA_OPTS \
    -Des.path.home="$ES_HOME" \
    -Des.path.conf="$ES_PATH_CONF" \
    -Des.distribution.flavor="$ES_DISTRIBUTION_FLAVOR" \
    -Des.distribution.type="$ES_DISTRIBUTION_TYPE" \
    -Des.bundled_jdk="$ES_BUNDLED_JDK" \
    -cp "$ES_CLASSPATH" \
    org.elasticsearch.bootstrap.Elasticsearch \
    "${ARG_LIST[@]}" \
    <<<"$KEYSTORE_PASSWORD" &
justincr-elastic commented 2 years ago

Reproduce original issue by executing

> docker run --name elastic1 -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -v C:\Docker\elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml --rm -it docker.elastic.co/elasticsearch/elasticsearch:8.0.0 
Exception in thread "main" java.nio.file.FileSystemException: /usr/share/elasticsearch/config/elasticsearch.yml.Occjcc_mS06vpoRLwlpUwA.tmp -> /usr/share/elasticsearch/config/elasticsearch.yml: Device or resource busy
        at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:100)
        at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
        at java.base/sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:416)
        at java.base/sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:267)
        at java.base/java.nio.file.Files.move(Files.java:1432)
        at org.elasticsearch.xpack.security.cli.AutoConfigureNode.fullyWriteFile(AutoConfigureNode.java:1136)
        at org.elasticsearch.xpack.security.cli.AutoConfigureNode.fullyWriteFile(AutoConfigureNode.java:1148)
        at org.elasticsearch.xpack.security.cli.AutoConfigureNode.execute(AutoConfigureNode.java:687)
        at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:77)
        at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:112)
        at org.elasticsearch.cli.Command.main(Command.java:77)
        at org.elasticsearch.xpack.security.cli.AutoConfigureNode.main(AutoConfigureNode.java:157)

Extract interesting files from container (Prerequisite: All C:\Docker to file sharing accept list)

> docker run --name elastic1 -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -v "C:\Docker":/mnt/local --rm -it docker.elastic.co/elasticsearch/elasticsearch:8.0.0 bash
elasticsearch@9d37e1eb7777:~$ cp /usr/share/elasticsearch/config/elasticsearch.yml /mnt/local/elasticsearch.yml
elasticsearch@9d37e1eb7777:~$ cp /usr/share/elasticsearch/config/elasticsearch.yml /mnt/local/elasticsearch2.yml
elasticsearch@9d37e1eb7777:~$ cp /usr/local/bin/docker-entrypoint.sh               /mnt/local/docker-entrypoint.sh
elasticsearch@9d37e1eb7777:~$ cp /usr/share/elasticsearch/bin/elasticsearch        /mnt/local/elasticsearch

Start in bash as root user, switch to elasticsearch, manually run docker-entrypoint.sh to reproduce the original error

> docker run -u root --name elastic1 -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -v C:\Docker\elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml -v C:\Docker\elasticsearch2.yml:/usr/share/elasticsearch/config/elasticsearch2.yml --rm -it docker.elastic.co/elasticsearch/elasticsearch:8.0.0 bash

root@62b736fca663:/usr/share/elasticsearch# ls -l /usr/share/elasticsearch/config/elasticsearch*.yml
-rw-rw-r-- 1 root root 1042 Feb  3 16:47 /usr/share/elasticsearch/config/elasticsearch-plugins.example.yml
-rwxr-xr-x 1 root root   53 Mar 29 19:01 /usr/share/elasticsearch/config/elasticsearch.yml
-rwxr-xr-x 1 root root   53 Mar 29 19:01 /usr/share/elasticsearch/config/elasticsearch2.yml

root@62b736fca663:/usr/share/elasticsearch# df -a | grep elasticsearch
grpcfuse       998896636 190624520 808272116  20% /usr/share/elasticsearch/config/elasticsearch.yml
grpcfuse       998896636 190624520 808272116  20% /usr/share/elasticsearch/config/elasticsearch2.yml

root@62b736fca663:/usr/share/elasticsearch# su - elasticsearch

elasticsearch@62b736fca663:~$ /usr/local/bin/docker-entrypoint.sh
Exception in thread "main" java.nio.file.FileSystemException: /usr/share/elasticsearch/config/elasticsearch.yml.JrtBhUSPQ4eNKgiJ3atKQQ.tmp -> /usr/share/elasticsearch/config/elasticsearch.yml: Device or resource busy
        at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:100)
        at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
        at java.base/sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:416)
        at java.base/sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:267)
        at java.base/java.nio.file.Files.move(Files.java:1432)
        at org.elasticsearch.xpack.security.cli.AutoConfigureNode.fullyWriteFile(AutoConfigureNode.java:1136)
        at org.elasticsearch.xpack.security.cli.AutoConfigureNode.fullyWriteFile(AutoConfigureNode.java:1148)
        at org.elasticsearch.xpack.security.cli.AutoConfigureNode.execute(AutoConfigureNode.java:687)
        at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:77)
        at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:112)
        at org.elasticsearch.cli.Command.main(Command.java:77)
        at org.elasticsearch.xpack.security.cli.AutoConfigureNode.main(AutoConfigureNode.java:157)

elasticsearch@62b736fca663:~$ ls -l /usr/share/elasticsearch/config/elasticsearch*.yml
-rw-rw-r-- 1 root          root          1042 Feb  3 16:47 /usr/share/elasticsearch/config/elasticsearch-plugins.example.yml
-rwxr-xr-x 1 elasticsearch elasticsearch   53 Mar 29 19:01 /usr/share/elasticsearch/config/elasticsearch.yml
-rwxr-xr-x 1 root          root            53 Mar 29 19:01 /usr/share/elasticsearch/config/elasticsearch2.yml
justincr-elastic commented 2 years ago

Check elasticsearch.yml ownership and permissions before and after manually running docker-entrypoint.sh.

>docker run -u root --name elastic1 -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" --rm -it docker.elastic.co/elasticsearch/elasticsearch:8.0.0 bash

root@40b71bc4c3ae:/usr/share/elasticsearch# ls -l /usr/share/elasticsearch/config/elasticsearch.yml
-rw-rw-r-- 1 root root 53 Feb  3 22:53 /usr/share/elasticsearch/config/elasticsearch.yml

root@40b71bc4c3ae:/usr/share/elasticsearch# su - elasticsearch

elasticsearch@40b71bc4c3ae:~$ /usr/local/bin/docker-entrypoint.sh > /dev/null 2> /dev/null &
[1] 18

elasticsearch@40b71bc4c3ae:~$ ls -l /usr/share/elasticsearch/config/elasticsearch.yml
-rw-rw-r-- 1 elasticsearch elasticsearch 1106 Mar 29 20:47 /usr/share/elasticsearch/config/elasticsearch.yml
justincr-elastic commented 2 years ago

If the operator does not mount elasticsearch.yml, I assume they want elasticsearch.yml autoconfiguration. If the operator mounts elasticsearch.yml, I assume they don't want elasticsearch.yml autoconfiguration.

From looking at the startup scripts, I don't see an option to skip autoconfiguration. The only way seems to be if ENROLLMENT_TOKEN is set.

linghengqian commented 2 years ago

Note that in addition to elasticsearch, kibana actually overwrites the configuration file to write content. So in fact, should the initialization file be separated from the actual configuration file like the .conf.d file, such as adding a concept of elasticsearch-d.yml to be responsible for initialization?

albertzaharovits commented 2 years ago

If the operator does not mount elasticsearch.yml, I assume they want elasticsearch.yml autoconfiguration. If the operator mounts elasticsearch.yml, I assume they don't want elasticsearch.yml autoconfiguration.

If you're proposing this should be the logic we use in the auto-configuration, I concur.

Should the same logic extend to the config directory?

jkakavas commented 2 years ago

If the operator mounts elasticsearch.yml, I assume they don't want elasticsearch.yml autoconfiguration.

I'd just like to add that this is not always the case. Whether, we should accept that as a limitation and work with this is another topic ( which I probably also agree with ) but for instance, on both cases this was reported in the forums, the users wanted to set a specific value (i.e. network.host to affect the SANs of the HTTP certificate ) but take advantage of the security features

albertzaharovits commented 2 years ago

We briefly discussed this today in our weekly sync. There was consensus that mounting only the elasticsearch.yml file, but leaving the rest of the config directory on the docker container, is not a configuration that works well with Security auto-configuration (primarily because persisting only the generated yml file, without the associated keystore and certs, is not useful for subsequent container runs).

I have taken an action item to investigate what is the consistent way to react to such a configuration, from starting without security auto-conf, or not starting at all. I'll assign this to me.

mister-good-deal commented 2 years ago

Got this issue with version 8.1.2.

Having a specific configuration file elasticsearch.yml is simplier to handle than defining all the env variables in the docker-compose.yaml that can be very verbose when using multiple docker services.

greenscar commented 2 years ago

This is due to the container using elasticsearch:elasticsearch as the user. Docker containers are intended to run everything via root:root.

All you need to do is set the ownerID and groupID of the directories being mounted to 1000:1000

ex:


- name: Create elk directory if it does not exist
  ansible.builtin.file:
    path: /opt/elk/{{ item.name }}
    state: directory
    mode: '0755'
    owner: "{{ item.oid }}"
    group: "{{ item.gid }}"
  with_items:
    - { name: "elasticsearch/config", oid: 1000, gid: 1000}
    - { name: "elasticsearch/data", oid: 1000, gid: 1000}
    - { name: "kibana/config", oid: 1000, gid: 1000}
    - { name: "kibana/data", oid: 1000, gid: 1000}
  become: yes
Milana-Gelman-PX commented 2 years ago

Hi all, There is some progress with this bug ? got this issue with version 8.3.2. Setting the ownerID and groupID of the mounted directories to 1000:1000 not resolving to issue.

chance2021 commented 2 years ago

I am using env var, instead of mounting elasticsearch.yml. For example, I add ELASTICSEARCH_FS_SNAPSHOT_REPO_PATH=/mnt/backup in order to setup snapshot repo.

tvernum commented 2 years ago

I have taken an action item to investigate what is the consistent way to react to such a configuration, from starting without security auto-conf, or not starting at all.

@albertzaharovits did you get anywhere with this?

My feeling is that we should do something like (if we determine auto-configuration is needed)

  1. Try to write a temporary file to the config directory. If that fails, then we know we won't be successful with auto-configuration, and we should skip it
  2. Check whether that temporary file has the same mount point as elasticsearch.yml and elasticsearch.keystore if not, then we can assume that auto-configuration will do the wrong thing (that is, it would write files to 2 or more different mount points, leading to one or both being orphaned). In that case we should cleanup the temp file and skip the rest of auto configuration. We can probably just check that the output from findmnt --noheadings --output TARGET --target ${file} is the same for all 3 files (temp, yml, keystore)
  3. Otherwise, remove the temp file and proceed with auto-configuration

We should talk about whether to do that for all packaging types, or just for docker.

psychogun commented 2 years ago

Bumping this as this causes issues when trying to run the elasticsearch container as a rootless container using systemd.

I have tried to copy some files (the certs and .yml and .keystore) and bind mount them, and then adding -e ATTEMPT_SECURITY_AUTO_CONFIG=false to podman run, but I could not get the correct enrollment token.

I would very much like to have all the security bells and whistles autoconfigured for me + persistent storage :)

psychogun commented 2 years ago

I think I got it working; rootless containers running at boot without having the user having to log in. Here is a little write up. Hopefully you guys can make this a bit easier!

cat /etc/*-release
Rocky Linux release 8.6 (Green Obsidian)
NAME="Rocky Linux"
VERSION="8.6 (Green Obsidian)"

Initial start to generate some files:

podman run --name es01 --net elastic -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -v /podman/elasticsearch/data:/usr/share/elasticsearch/data:Z -it docker.elastic.co/elasticsearch/elasticsearch:8.3.3

Ctrl + C to quit

cd ~/podman/elasticsearch/config
podman cp es01:/usr/share/elasticsearch/config/elasticsearch.yml .
podman cp es01:/usr/share/elasticsearch/config/elasticsearch.keystore .
mkdir ~/podman/elasticsearch/config/certs
cd certs
podman cp es01:/usr/share/elasticsearch/config/certs/http.p12 .
podman cp es01:/usr/share/elasticsearch/config/certs/transport.p12 .
podman cp es01:/usr/share/elasticsearch/config/certs/http_ca.crt .

podman stop es01
podman rm es01

rm -rf /podman/elasticsearch/data/*

Let us bind mount:

podman run --name es01 --net elastic -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -e ATTEMPT_SECURITY_AUTO_CONFIG=false -v ~/podman/elasticsearch/config/certs/http.p12:/usr/share/elasticsearch/config/certs/http.p12:Z -v ~/podman/elasticsearch/config/certs/transport.p12:/usr/share/elasticsearch/config/certs/transport.p12:Z -v ~/podman/elasticsearch/config/certs/http_ca.crt:/usr/share/elasticsearch/config/certs/http_ca.crt:Z -v ~/podman/elasticsearch/config/elasticsearch.keystore:/usr/share/elasticsearch/config/elasticsearch.keystore:Z -v ~/podman/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml:Z -v /podman/elasticsearch/data:/usr/share/elasticsearch/data:Z -dt docker.elastic.co/elasticsearch/elasticsearch:8.3.3

Let us get the enrollment token (a lot of errors here, but it spits out the code in the end):

podman exec -it es01 /usr/share/elasticsearch/bin/elasticsearch-create-enrollment-token -s kibana

Let us start Kibana and use our enrollment procedure visiting the website:5601 and grabbing the code from terminal:

podman run --name kib-01 --net elastic -p 5601:5601 -v ~/podman/kibana/data/:/usr/share/kibana/data/:Z docker.elastic.co/kibana/kibana:8.3.3

CTLR + C to stop kibana.

But, let us start it again so we can grab the kibana.yml configuration file:

podman start kib-01

mkdir ~/podman/kibana/config
cd ~/podman/kibana/config
podman cp kib-01:/usr/share/kibana/config/kibana.yml . 

Stop it, using podman stop kib-01.

Let us remove this file:

rm ~/podman/kibana/data/uuid

This will be our final run command for Kibana:

podman run --name kib-01 --net elastic -p 5601:5601 -v ~/podman/kibana/config/kibana.yml:/usr/share/kibana/config/kibana.yml:Z -v ~/podman/kibana/data/:/usr/share/kibana/data:Z -e SERVER_PUBLICBASEURL=http://192.168.10.44 -dt docker.elastic.co/kibana/kibana:8.3.3

Now I have working persistent configuration and I can generate systemd unit files (??). Let us also stop the containers and remove them:

cd ~/.config/systemd/user
podman generate systemd --new --files --name es01
podman generate systemd --new --files --name kib-01 

podman stop kib-01
podman rm kib-01
podman stop es01
podman rm es01

Using systemctl for now:

systemctl --user enable --now container-es01.service
systemctl --user enable --now container-kib-01.service

But hey, what about passwords? This throws a lot of errors; although in the end it works and gives me a valid password for the elastic user:

podman exec -it es01 /usr/share/elasticsearch/bin/elasticsearch-reset-password -u elastic

I can reboot the host computer and everything works without having to log in (loginctl enable-linger).

The transport is now SSL encrypted, I have all the bells and whistles offered from the auto-configuration?

jakelandis commented 2 years ago

I found that if I explicitly set xpack.security.enabled: true and bind mount a keystore that has a bootstrap.password set, then bind mounting the elasticsearch.yml works fine. I haven't dug into the details of why or if that is correct behavior, but that is what I have observed.

Here is very simple single node cluster with a bind mounted elasticsearch.yml and keystore : https://github.com/jakelandis/es-docker-simple

martijnvdp commented 2 years ago

I found that if I explicitly set xpack.security.enabled: true and bind mount a keystore that has a bootstrap.password set, then bind mounting the elasticsearch.yml works fine. I haven't dug into the details of why or if that is correct behavior, but that is what I have observed.

Here is very simple single node cluster with a bind mounted elasticsearch.yml and keystore : https://github.com/jakelandis/es-docker-simple

setting xpack.security.enabled: true in the custom elasticsearch.yaml fixed it for me , now its get mounted ,

ywangd commented 2 years ago

I found that if I explicitly set xpack.security.enabled: true and bind mount a keystore that has a bootstrap.password set, then bind mounting the elasticsearch.yml works fine. I haven't dug into the details of why or if that is correct behavior, but that is what I have observed.

This is expected because enabling security explicitly makes the startup process skip security auto-configuration. The original error was thrown during security auto-configuration. Since it is skipped, the error no longer happens. But I believe the intention for this issue is whether we could either (1) detect the original bind mount situation and automatically skip auto configuration (IIUC, this is our preference) or (2) have auto configuration work if the the bind mount meets certain requirements.

z3r0101 commented 2 years ago

So.. no fix for this yet?

Anyways, try my workaround...

On your docker command line: -v /absolute/path/to/a/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml Make sure the volume file "/absolute/path/to/a/elasticsearch.yml" exists and is writable.

Also, elasticsearch.yml should not be empty.

My example configuration:

cluster.name: "docker-cluster" network.host: 0.0.0.0 xpack.license.self_generated.type: trial xpack.security.enabled: true

jyxjjj commented 1 year ago

Due to i only used for localhost,

version: '3.9'
services:
    elasticsearch:
        container_name: elasticsearch
        image: elasticsearch:8.5.2
        environment:
            - TZ=Etc/GMT-8
            - discovery.type=single-node
            - ES_JAVA_OPTS=-Xmx256M
        deploy:
            restart_policy:
                condition: on-failure
                delay: 5s
                max_attempts: 3
                window: 5s
            resources:
              limits:
              cpu: 1
              memory: 2G
        ulimits:
            nofile:
                soft: 65535
                hard: 65535
        sysctls:
            - net.ipv6.conf.all.disable_ipv6=1
            - net.ipv6.conf.default.disable_ipv6=1
            - net.ipv6.conf.lo.disable_ipv6=1
            - net.ipv4.conf.all.rp_filter=0
            - net.ipv4.conf.default.rp_filter=0
            - net.ipv4.conf.default.arp_announce=2
            - net.ipv4.conf.lo.arp_announce=2
            - net.ipv4.conf.all.arp_announce=2
            - net.ipv4.tcp_max_tw_buckets=5000
            - net.ipv4.tcp_syncookies=1
            - net.ipv4.tcp_max_syn_backlog=2048
            - net.core.somaxconn=51200
            - net.ipv4.tcp_synack_retries=2
            - net.ipv4.tcp_fastopen=3
        dns:
            - 223.5.5.5
            - 223.6.6.6
            - 1.1.1.1
            - 1.0.0.1
            - 8.8.8.8
            - 8.8.4.4
        ports:
            -   target: 9200
                published: 9200
                protocol: tcp
                mode: host
        volumes:
            -   type: bind
                source: /www/server/elasticsearch/config/elasticsearch.yml
                target: /usr/share/elasticsearch/config/elasticsearch.yml
            -   type: bind
                source: /www/server/elasticsearch/data
                target: /usr/share/elasticsearch/data
            -   type: bind
                source: /www/server/elasticsearch/plugins
                target: /usr/share/elasticsearch/plugins
        healthcheck:
            disable: true
networks:
    default:
        name: podman
        external: true

then the config

cluster.name: docker-cluster
network.host: 0.0.0.0
xpack.security.enabled: false

works

but the config

cluster.name: docker-cluster
network.host: 0.0.0.0

got this error.

Then i think https://github.com/elastic/elasticsearch/issues/85463#issuecomment-1229264396 Is correct!

joshxyzhimself commented 1 year ago

god, who broke this? on 8.6.1 we aren't getting these errors.

elasticsearch-1 | Could not rename log file 'logs/gc.log' to 'logs/gc.log.03' (Permission denied). elasticsearch-1 | {"@timestamp":"2023-06-06T09:41:21.503Z", "log.level":"ERROR", "message":"fatal exception while booting Elasticsearch", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.bootstrap.Elasticsearch","elasticsearch.node.name":"9a8a73e358ed","elasticsearch.cluster.name":"elasticsearch","error.type":"java.lang.IllegalStateException","error.message":"failed to obtain node locks, tried [/usr/share/elasticsearch/data]; maybe these locations are not writable or multiple nodes were started on the same data path?","error.stack_trace":"java.lang.IllegalStateException: failed to obtain node locks, tried [/usr/share/elasticsearch/data]; maybe these locations are not writable or multiple nodes were started on the same data path?\n\tat org.elasticsearch.server@8.8.0/org.elasticsearch.env.NodeEnvironment.(NodeEnvironment.java:291)\n\tat org.elasticsearch.server@8.8.0/org.elasticsearch.node.Node.(Node.java:483)\n\tat org.elasticsearch.server@8.8.0/org.elasticsearch.node.Node.(Node.java:327)\n\tat org.elasticsearch.server@8.8.0/org.elasticsearch.bootstrap.Elasticsearch$2.(Elasticsearch.java:216)\n\tat org.elasticsearch.server@8.8.0/org.elasticsearch.bootstrap.Elasticsearch.initPhase3(Elasticsearch.java:216)\n\tat org.elasticsearch.server@8.8.0/org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:67)\nCaused by: java.io.IOException: failed to obtain lock on /usr/share/elasticsearch/data\n\tat org.elasticsearch.server@8.8.0/org.elasticsearch.env.NodeEnvironment$NodeLock.(NodeEnvironment.java:236)\n\tat org.elasticsearch.server@8.8.0/org.elasticsearch.env.NodeEnvironment$NodeLock.(NodeEnvironment.java:204)\n\tat org.elasticsearch.server@8.8.0/org.elasticsearch.env.NodeEnvironment.(NodeEnvironment.java:283)\n\t... 5 more\nCaused by: java.nio.file.NoSuchFileException: /usr/share/elasticsearch/data/node.lock\n\tat java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)\n\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)\n\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)\n\tat java.base/sun.nio.fs.UnixPath.toRealPath(UnixPath.java:833)\n\tat org.apache.lucene.core@9.6.0/org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:94)\n\tat org.apache.lucene.core@9.6.0/org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:43)\n\tat org.apache.lucene.core@9.6.0/org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:44)\n\tat org.elasticsearch.server@8.8.0/org.elasticsearch.env.NodeEnvironment$NodeLock.(NodeEnvironment.java:229)\n\t... 7 more\n\tSuppressed: java.nio.file.AccessDeniedException: /usr/share/elasticsearch/data/node.lock\n\t\tat java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:90)\n\t\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)\n\t\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)\n\t\tat java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:261)\n\t\tat java.base/java.nio.file.Files.newByteChannel(Files.java:379)\n\t\tat java.base/java.nio.file.Files.createFile(Files.java:657)\n\t\tat org.apache.lucene.core@9.6.0/org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:84)\n\t\t... 10 more\n"}

coding-bunny commented 1 year ago

Running into the same problems. Was working fine the whole time with docker-compose, and suddenly when having killed the container and restarting it, I'm getting these errors:

bm-elasticsearch-poc-elastic-1  | {"@timestamp":"2023-07-26T08:04:29.450Z", "log.level":"ERROR", "message":"fatal exception while booting Elasticsearch", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsea
rch.server","process.thread.name":"main","log.logger":"org.elasticsearch.bootstrap.Elasticsearch","elasticsearch.node.name":"elastic-0","elasticsearch.cluster.name":"biz","error.type":"java.lang.IllegalStateException","error.mes
sage":"failed to obtain node locks, tried [/usr/share/elasticsearch/data]; maybe these locations are not writable or multiple nodes were started on the same data path?","error.stack_trace":"java.lang.IllegalStateException: faile
d to obtain node locks, tried [/usr/share/elasticsearch/data]; maybe these locations are not writable or multiple nodes were started on the same data path?\n\tat org.elasticsearch.server@8.7.1/org.elasticsearch.env.NodeEnvironme
nt.<init>(NodeEnvironment.java:291)\n\tat org.elasticsearch.server@8.7.1/org.elasticsearch.node.Node.<init>(Node.java:480)\n\tat org.elasticsearch.server@8.7.1/org.elasticsearch.node.Node.<init>(Node.java:324)\n\tat org.elastics
earch.server@8.7.1/org.elasticsearch.bootstrap.Elasticsearch$2.<init>(Elasticsearch.java:216)\n\tat org.elasticsearch.server@8.7.1/org.elasticsearch.bootstrap.Elasticsearch.initPhase3(Elasticsearch.java:216)\n\tat org.elasticsea
rch.server@8.7.1/org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:67)\nCaused by: java.io.IOException: failed to obtain lock on /usr/share/elasticsearch/data\n\tat org.elasticsearch.server@8.7.1/org.elasticsearc
h.env.NodeEnvironment$NodeLock.<init>(NodeEnvironment.java:236)\n\tat org.elasticsearch.server@8.7.1/org.elasticsearch.env.NodeEnvironment$NodeLock.<init>(NodeEnvironment.java:204)\n\tat org.elasticsearch.server@8.7.1/org.elasti
csearch.env.NodeEnvironment.<init>(NodeEnvironment.java:283)\n\t... 5 more\nCaused by: java.nio.file.NoSuchFileException: /usr/share/elasticsearch/data/node.lock\n\tat java.base/sun.nio.fs.UnixException.translateToIOException(Un
ixException.java:92)\n\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)\n\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)\n\tat java.base/sun.nio.fs.UnixPath
.toRealPath(UnixPath.java:833)\n\tat org.apache.lucene.core@9.5.0/org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:94)\n\tat org.apache.lucene.core@9.5.0/org.apache.lucene.store.FSLockFactory.obt
ainLock(FSLockFactory.java:43)\n\tat org.apache.lucene.core@9.5.0/org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:44)\n\tat org.elasticsearch.server@8.7.1/org.elasticsearch.env.NodeEnvironment$NodeLock.<init>
(NodeEnvironment.java:229)\n\t... 7 more\n\tSuppressed: java.nio.file.AccessDeniedException: /usr/share/elasticsearch/data/node.lock\n\t\tat java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:90)\n\t\ta
t java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)\n\t\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)\n\t\tat java.base/sun.nio.fs.UnixFileSystemProvider.newByt
eChannel(UnixFileSystemProvider.java:261)\n\t\tat java.base/java.nio.file.Files.newByteChannel(Files.java:379)\n\t\tat java.base/java.nio.file.Files.createFile(Files.java:657)\n\t\tat org.apache.lucene.core@9.5.0/org.apache.luce
ne.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:84)\n\t\t... 10 more\n"}

This is the docker-compose:

version: '2.2'
services:
  elastic:
    build:
      context: ./
      dockerfile: docker/elasticsearch/Dockerfile
    privileged: true
    environment:
      - cluster.name=biz
      - node.name=elastic-0
      - xpack.security.enabled=true
      - discovery.type=single-node
      - "ES_JAVA_OPTS=-Xms2g -Xmx2g"
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    mem_limit: 4g
    cap_add:
      - IPC_LOCK
    volumes:
      - ./docker/_data/elasticsearch:/usr/share/elasticsearch/data
    ports:
      - "9200:9200"
    healthcheck:
      test: ["CMD", "curl","-s" ,"-f", "http://localhost:9200/_cat/health"]
      retries: 10
    networks:
      - biz 

  kibana:
    image: docker.elastic.co/kibana/kibana:8.7.1
    container_name: kibana
    privileged: true
    ports:
      - "5601:5601"
    healthcheck:
      test: ["CMD", "curl", "-s", "-f", "http://localhost:5601/"]
      retries: 10
    depends_on:
      elastic:
        condition: service_healthy
    environment:
      - "ELASTICSEARCH_HOSTS=http://elastic:9200"
    networks:
      - biz

  app:
    build:
      context: .
      dockerfile: docker/app/Dockerfile
      args:
        - WITH_XDEBUG=true
    environment:
      - DEBUG=true
      - PHP_IDE_CONFIG=serverName=app
      # - XDEBUG_CONFIG=remote_host=172.32.0.1 remote_port=9001
    ports:
      - '80:80'
    volumes:
      - './:/var/www/html'
    networks:
      - biz

networks:
  biz:
    name: biz
    driver: bridge

Nothing has changed whatsoever, and this was working just fine. Even rebuilding the images doesn't solve the issue. No idea why it can't create the desired lock file, even though I see locally my _data/elasticsearch folder being created from the running container.

zakhaev26 commented 7 months ago

facing the same issue,has anyone got a solution regarding this ?

linghengqian commented 7 months ago

facing the same issue,has anyone got a solution regarding this ?

zakhaev26 commented 7 months ago

where can i find a docker-compose and it's related config files that actually works?i followed the one present at the elasticsearch's offiicial installation guide but i get logs like :

elasticsearch_container  | {"@timestamp":"2024-03-27T13:06:55.893Z", "log.level": "WARN",  "data_stream.dataset":"deprecation.elasticsearch","data_stream.namespace":"default","data_stream.type":"logs","elasticsearch.event.category":"settings","event.code":"xpack.monitoring.collection.enabled","message":"[xpack.monitoring.collection.enabled] setting was deprecated in Elasticsearch and will be removed in a future release." , "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"deprecation.elasticsearch","process.thread.name":"main","log.logger":"org.elasticsearch.deprecation.common.settings.Settings","elasticsearch.node.name":"12f9b075322d","elasticsearch.cluster.name":"docker-cluster"}
elasticsearch_container  | {"@timestamp":"2024-03-27T13:06:55.910Z", "log.level":"ERROR", "message":"fatal exception while booting Elasticsearch", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.bootstrap.Elasticsearch","elasticsearch.node.name":"12f9b075322d","elasticsearch.cluster.name":"docker-cluster","error.type":"java.lang.IllegalStateException","error.message":"failed to obtain node locks, tried [/usr/share/elasticsearch/data]; maybe these locations are not writable or multiple nodes were started on the same data path?","error.stack_trace":"java.lang.IllegalStateException: failed to obtain node locks, tried [/usr/share/elasticsearch/data]; maybe these locations are not writable or multiple nodes were started on the same data path?\n\tat org.elasticsearch.server@8.11.0/org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:294)\n\tat org.elasticsearch.server@8.11.0/org.elasticsearch.node.Node.<init>(Node.java:499)\n\tat org.elasticsearch.server@8.11.0/org.elasticsearch.node.Node.<init>(Node.java:344)\n\tat org.elasticsearch.server@8.11.0/org.elasticsearch.bootstrap.Elasticsearch$2.<init>(Elasticsearch.java:236)\n\tat org.elasticsearch.server@8.11.0/org.elasticsearch.bootstrap.Elasticsearch.initPhase3(Elasticsearch.java:236)\n\tat org.elasticsearch.server@8.11.0/org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:73)\nCaused by: java.io.IOException: failed to obtain lock on /usr/share/elasticsearch/data\n\tat org.elasticsearch.server@8.11.0/org.elasticsearch.env.NodeEnvironment$NodeLock.<init>(NodeEnvironment.java:239)\n\tat org.elasticsearch.server@8.11.0/org.elasticsearch.env.NodeEnvironment$NodeLock.<init>(NodeEnvironment.java:206)\n\tat org.elasticsearch.server@8.11.0/org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:286)\n\t... 5 more\nCaused by: java.nio.file.NoSuchFileException: /usr/share/elasticsearch/data/node.lock\n\tat java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)\n\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)\n\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)\n\tat java.base/sun.nio.fs.UnixPath.toRealPath(UnixPath.java:834)\n\tat org.apache.lucene.core@9.8.0/org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:94)\n\tat org.apache.lucene.core@9.8.0/org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:43)\n\tat org.apache.lucene.core@9.8.0/org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:44)\n\tat org.elasticsearch.server@8.11.0/org.elasticsearch.env.NodeEnvironment$NodeLock.<init>(NodeEnvironment.java:232)\n\t... 7 more\n\tSuppressed: java.nio.file.AccessDeniedException: /usr/share/elasticsearch/data/node.lock\n\t\tat java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:90)\n\t\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)\n\t\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)\n\t\tat java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:261)\n\t\tat java.base/java.nio.file.Files.newByteChannel(Files.java:379)\n\t\tat java.base/java.nio.file.Files.createFile(Files.java:657)\n\t\tat org.apache.lucene.core@9.8.0/org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:84)\n\t\t... 10 more\n"}
elasticsearch_container  | ERROR: Elasticsearch did not exit normally - check the logs at /usr/share/elasticsearch/logs/docker-cluster.log
elasticsearch_container  | 
elasticsearch_container  | 
elasticsearch_container  | ERROR: Elasticsearch exited unexpectedly, with exit code 1
itbill commented 7 months ago

Don't only mount the elasticsearch.yml from host. Mount the whole config directory. E.g.

docker run --name oh-noes-this-fails -p 9200:9200 -v /absolute/path/to/config:/usr/share/elasticsearch/config -it docker.elastic.co/elasticsearch/elasticsearch:8.0.0

You will need to copy all the files under the /usr/share/elasticsearch/config folders to the host and customize it:

$ docker container run -it --rm elasticsearch:8.13.0 bash
elasticsearch@d9d90482601a:~$ cd /usr/share/elasticsearch/config/
elasticsearch@d9d90482601a:~/config$ ls -al
total 68
drwxrwxr-x 1 elasticsearch root  4096 Mar 26 18:49 .
drwxrwxr-x 1 root          root  4096 Mar 26 18:49 ..
-rw-rw-r-- 1 root          root  1042 Mar 22 03:34 elasticsearch-plugins.example.yml
-rw-rw-r-- 1 root          root    53 Mar 26 18:49 elasticsearch.yml
-rw-rw-r-- 1 root          root  2727 Mar 22 03:34 jvm.options
drwxrwxr-x 1 elasticsearch root  4096 Mar 22 03:37 jvm.options.d
-rw-rw-r-- 1 root          root 17969 Mar 22 03:40 log4j2.file.properties
-rw-rw-r-- 1 root          root 12549 Mar 26 18:49 log4j2.properties
-rw-rw-r-- 1 root          root   473 Mar 22 03:40 role_mapping.yml
-rw-rw-r-- 1 root          root   197 Mar 22 03:40 roles.yml
-rw-rw-r-- 1 root          root     0 Mar 22 03:40 users
-rw-rw-r-- 1 root          root     0 Mar 22 03:40 users_roles

The hint of the solution came from the error itself:

Exception in thread "main" java.nio.file.FileSystemException: /usr/share/elasticsearch/config/elasticsearch.yml.R0_9BZ4hRx-v8zK3F0U-Bw.tmp -> /usr/share/elasticsearch/config/elasticsearch.yml: Device or resource busy at java.base/java.nio.file.Files.move(Files.java:1432)

The server tried to move a temporarily created file to a file location mounted to the host. Since the file location was mounted to the host, the server could not remove the file and threw the exception.

kgrozdanovski commented 5 months ago

Bumping this issue since it seems fairly essential for containerized environments yet 2 years later no realistic solution or workaround has been identified.

ibaraki-douji commented 5 months ago

@kgrozdanovski what @itbill wrote works. Also in the first launch it will create the defaults conf files so you don't need to copy all the files from the container to the host volume.

kgrozdanovski commented 5 months ago

@kgrozdanovski what @itbill wrote works.

Also in the first launch it will create the defaults conf files so you don't need to copy all the files from the container to the host volume.

itbill suggested a workaround which is not a clean solution, nor is it documented anywhere. Furthermore what you are suggesting contradicts his comment since he notes you must copy all config files into the directory you are binding.

TLDR; there is still no real solution.

ibaraki-douji commented 5 months ago

After reading the docs here is the fix for the busy errror (they talk about the keystore but it's the same) https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#_elasticsearch_keystore_device_or_resource_busy

eliphatfs commented 2 months ago

After reading the docs here is the fix for the busy errror (they talk about the keystore but it's the same) https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#_elasticsearch_keystore_device_or_resource_busy

It's not quite the same, es requires other files in config directory to run. If I mount an empty config directory for it to generate stuff in, it gives me another error ERROR: Missing logging config file at /usr/share/elasticsearch/config/log4j2.properties, with exit code 78

josh-i386g commented 2 months ago

The following works for me.

#!/bin/bash

sudo rm -rf ./elastic/data/
sudo rm -rf ./elastic/logs/

sudo mkdir -p ./elastic/data
sudo mkdir -p ./elastic/logs
sudo chmod -R 777 ./elastic/data/
sudo chmod -R 777 ./elastic/logs/
sudo chown -R 1000:1000 ./elastic/data/
sudo chown -R 1000:1000 ./elastic/logs/
services:

  # https://hub.docker.com/_/elasticsearch
  elastic:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.14.2
    restart: unless-stopped
    ports:
      - 127.0.0.1:9200:9200
    environment:
      network.host: 0.0.0.0
      discovery.type: single-node
      bootstrap.memory_lock: true
      xpack.security.enabled: false
      ingest.geoip.downloader.enabled: false
      logger.org.elasticsearch: ERROR
      logger.com.azure.core: ERROR
      logger.org.apache: ERROR
      ES_JAVA_OPTS: -Xms1g -Xmx1g
      ELASTIC_PASSWORD: ${ELASTIC_PASSWORD:?error}
    volumes:
      - ./elastic/data:/usr/share/elasticsearch/data
      - ./elastic/logs:/usr/share/elasticsearch/logs
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nproc:
        soft: 65536
        hard: 65536
      nofile:
        soft: 65536
        hard: 65536
    cap_add:
      - IPC_LOCK
    healthcheck:
      test:
        [
          "CMD-SHELL",
          "curl --fail --silent http://localhost:9200/_cluster/health",
        ]
      interval: 10s
      timeout: 10s
      retries: 120
    networks:
      - my-network

networks:
  my-network:
    name: my-network
ehsanafter commented 1 month ago

hey guys do you think this gonna get fixed anytime soon?