elastic / ansible-elasticsearch

Ansible playbook for Elasticsearch
Other
1.59k stars 857 forks source link

Elasticsearch does not start: '${ES_TMPDIR}' does not exist #791

Closed sourcecode-glitch closed 2 years ago

sourcecode-glitch commented 3 years ago

Elasticsearch version: Version: 6.1.2, Build: 5b1fea5/2018-01-10T02:35:59.208Z

Role version: v7.12.0

JVM version (java -version): 1.8.0_275

OS version (uname -a if on a Unix-like system): Linux nosql1 4.9.0-9-amd64 #1 SMP Debian 4.9.168-1+deb9u2 (2019-05-13) x86_64 GNU/Linux

Description of the problem including expected versus actual behaviour: The role times out at the "Wait for elasticsearch to startup" task instead of correctly starting. It seems like the env var ${ES_TMPDIR} in jvm.options is not resolved, therefore it tries to find a directory literally called "$ES_TMPDIR".

I am running this via molecule on a vagrant VM (though I don't expect this to be important for this issue).

Playbook:

---
- name: Install elasticsearch
  hosts: all
  serial: 1
  remote_user: root
  roles:
    - role: 'elasticsearch'
      es_instance_name: "{{ ansible_nodename }}"
      es_data_dirs: "/var/lib/elasticsearch"
      es_config:
        cluster.name: es-cluster
        discovery.zen.ping.unicast.hosts: "{{ unicast_hosts | join(',') }}"
        network.host: "['{{ ansible_eth0.ipv4.address }}', '_local_']"
        discovery.zen.minimum_master_nodes: 2
        script.max_compilations_rate: "1000/1m"
      es_api_host: "{{ ansible_nodename }}"
      es_major_version: "6.x"
      es_version: "6.1.2"
      es_heap_size: '1g'

Provide logs from Ansible:

TASK [elasticsearch : Make sure elasticsearch is started] **********************
ok: [nosql1]

TASK [elasticsearch : Wait for elasticsearch to startup] ***********************
fatal: [nosql1]: FAILED! => {"changed": false, "elapsed": 300, "msg": "Timeout when waiting for nosql1:9200"}

These are only the last lines. Please see the full log in case you need more details.

ES Logs if relevant:

vagrant@nosql1:~$ sudo journalctl -u elasticsearch
-- Logs begin at Thu 2021-04-01 09:44:41 GMT, end at Thu 2021-04-01 10:03:09 GMT. --
Apr 01 09:48:55 nosql1 systemd[1]: Started Elasticsearch.
Apr 01 09:49:01 nosql1 elasticsearch[6265]: JNA Warning: IOException removing temporary files: JNA temporary directory '${ES_TMPDIR}' does not exist
Apr 01 09:49:02 nosql1 systemd[1]: elasticsearch.service: Main process exited, code=exited, status=1/FAILURE
Apr 01 09:49:02 nosql1 systemd[1]: elasticsearch.service: Unit entered failed state.
Apr 01 09:49:02 nosql1 systemd[1]: elasticsearch.service: Failed with result 'exit-code'.

When starting the elasticsearch binary directly from command line there is a much more detailed java stacktrace.

It includes the line Caused by: java.nio.file.AccessDeniedException: /home/vagrant/${ES_TMPDIR} so it seems like the environment variable is not resolved. The env var is defined as /tmp:

elasticsearch@nosql1:/home/vagrant$ echo $ES_TMPDIR
/tmp
sourcecode-glitch commented 3 years ago

I am wondering if this could be because this repo uses one jvm.options file regardless of the ES version. This may be related to #738

jmlrt commented 3 years ago

Hi @sourcecode-glitch, without digging too much, I think this could be related to deploying the playbook as user root (remote_user: root), did you try with using become: yes which is the recommended way instead?

sourcecode-glitch commented 3 years ago

Thanks for the suggestion but it did not work with become. I get exactly the same result.

Just for completeness, this is the diff compared with the previous version:

@@ -2,7 +2,7 @@
 - name: Install elasticsearch
   hosts: all
   serial: 1
-  remote_user: root
+  become: yes
   roles:
     - role: 'elasticsearch'
       es_instance_name: "{{ ansible_nodename }}"
botelastic[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

jmlrt commented 3 years ago

still valid

botelastic[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

botelastic[bot] commented 2 years ago

This issue has been automatically closed because it has not had recent activity since being marked as stale.