elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.99k stars 24.76k forks source link

ES Service init script intermittently fails to start on Ubuntu #1942

Closed vijayakumark closed 11 years ago

vijayakumark commented 12 years ago

The Elastic Search service init script intermittently fails to start on Ubuntu. The status shows "failed" however, no entries are updated in the log.

Possibility relevant input

Believe it could be related to memory allocation since same config starts at other times.

kimchy commented 12 years ago

Nothing in the OS log files regarding why it failed?

vijayakumark commented 12 years ago

Yes. Except for the auth log having an entry about switching user to elasticsearch there is no other entry in the OS logs at all. At times, it starts just fine.

When it doesn't start, If I manually ran the script at /usr/share/elasticsearch/bin/elasticsearch with same parameters, ran just fine.

oleiade commented 11 years ago

+1 but in my case, elasticsearch actually nevers starts using the init.d.

After a few digging, I discovered lauching via sudo -u elasticsearch /usr/share/elasticsearch/bin/elasticsearch that their might be a owner/rights problem over /usr/share/elasticsearch as it thrown:

{0.20.6}: Initialization Failed ...
- ElasticSearchIllegalStateException[Failed to obtain node lock, is the following location writable?: [/usr/share/elasticsearch/data/elasticsearch]]
    IOException[failed to obtain lock on /usr/share/elasticsearch/data/elasticsearch/nodes/49]
        IOException[Cannot create directory: /usr/share/elasticsearch/data/elasticsearch/nodes/49]

Once I chwon -R elasticsearch:elasticsearch /usr/share/elasticsearch the error goes away, but still, the service does not start using the init.d script, and nothing appears in logs.

Any further ideas? Thanks

oleiade commented 11 years ago

Okay, I found out, and write it down here as it might be useful to someone else. Trying to launch elasticsearch using the same command as the init.d script through the elasticsearch user, I had this error:

$ sudo -u elasticsearch /usr/share/elasticsearch/bin/elasticsearch -p /var/run/elasticsearch.pid -Des.default.config=/etc/elasticsearch/elasticsearch.yml -Des.default.path.home=/usr/share/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/var/lib/elasticsearch -Des.default.path.work=/tmp/elasticsearch -Des.default.path.conf=/etc/elasticsearch

leiade@gesicht:~$ {0.20.6}: Initialization Failed ...
- ElasticSearchIllegalStateException[Failed to obtain node lock, is the following location writable?: [/var/elasticsearch/data/pluto]]
    IOException[failed to obtain lock on /var/elasticsearch/data/pluto/nodes/49]
        IOException[Cannot create directory: /var/elasticsearch/data/pluto/nodes/49]

Yet another rights problem. After mkdir /var/elasticsearch && chown -R elasticsearch:elasticsearch /var/elasticsearch the init.d script works fine.

spinscale commented 11 years ago

hey,

why do you have /var/elasticsearch as your data dir. It should be /var/lib/elasticsearch - did you reconfigure it?

Anyway it should work (if the directory was specified in your configuration). In the current init script, we are setting the permissions and the ownership for those directories in /etc/init.d/elasticsearch - maybe you can add set -x to the init script and see if the calls changing the permissions happen.

oleiade commented 11 years ago

Hey!

thanks for this quick answer!

Yes, I've reconfigured the data dir to point to /var/elasticsearch/data, that's why it points there. No problem for me as I'm deploying elasticsearch via puppet, and can easily set the right permissions myself.

I've added set -x to the init script, and here's the result:

+ PATH=/bin:/usr/bin:/sbin:/usr/sbin
+ NAME=elasticsearch
+ DESC='ElasticSearch Server'
+ DEFAULT=/etc/default/elasticsearch
++ id -u
+ '[' 0 -ne 0 ']'
+ . /lib/lsb/init-functions
++ FANCYTTY=
++ '[' -e /etc/lsb-base-logging.sh ']'
++ . /etc/lsb-base-logging.sh
+++ LOG_DAEMON_MSG=
+ '[' -r /etc/default/rcS ']'
+ . /etc/default/rcS
++ TMPTIME=0
++ SULOGIN=no
++ DELAYLOGIN=no
++ UTC=yes
++ VERBOSE=no
++ FSCKFIX=yes
+ ES_USER=elasticsearch
+ ES_GROUP=elasticsearch
+ JDK_DIRS='/usr/lib/jvm/java-7-oracle /usr/lib/jvm/java-7-openjdk /usr/lib/jvm/java-7-openjdk-amd64/ /usr/lib/jvm/java-7-openjdk-i386/ /usr/lib/jvm/java-6-sun /usr/lib/jvm/java-6-openjdk'
+ for jdir in '$JDK_DIRS'
+ '[' -r /usr/lib/jvm/java-7-oracle/bin/java -a -z '' ']'
+ for jdir in '$JDK_DIRS'
+ '[' -r /usr/lib/jvm/java-7-openjdk/bin/java -a -z '' ']'
+ for jdir in '$JDK_DIRS'
+ '[' -r /usr/lib/jvm/java-7-openjdk-amd64//bin/java -a -z '' ']'
+ JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/
+ for jdir in '$JDK_DIRS'
+ '[' -r /usr/lib/jvm/java-7-openjdk-i386//bin/java -a -z /usr/lib/jvm/java-7-openjdk-amd64/ ']'
+ for jdir in '$JDK_DIRS'
+ '[' -r /usr/lib/jvm/java-6-sun/bin/java -a -z /usr/lib/jvm/java-7-openjdk-amd64/ ']'
+ for jdir in '$JDK_DIRS'
+ '[' -r /usr/lib/jvm/java-6-openjdk/bin/java -a -z /usr/lib/jvm/java-7-openjdk-amd64/ ']'
+ export JAVA_HOME
+ ES_HOME=/usr/share/elasticsearch
+ MAX_OPEN_FILES=65535
+ LOG_DIR=/var/log/elasticsearch
+ DATA_DIR=/var/lib/elasticsearch
+ WORK_DIR=/tmp/elasticsearch
+ CONF_DIR=/etc/elasticsearch
+ CONF_FILE=/etc/elasticsearch/elasticsearch.yml
+ '[' -f /etc/default/elasticsearch ']'
+ . /etc/default/elasticsearch
+ PID_FILE=/var/run/elasticsearch.pid
+ DAEMON=/usr/share/elasticsearch/bin/elasticsearch
+ DAEMON_OPTS='-p /var/run/elasticsearch.pid -Des.default.config=/etc/elasticsearch/elasticsearch.yml -Des.default.path.home=/usr/share/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/var/lib/elasticsearch -Des.default.path.work=/tmp/elasticsearch -Des.default.path.conf=/etc/elasticsearch'
+ export ES_HEAP_SIZE
+ export ES_HEAP_NEWSIZE
+ export ES_DIRECT_SIZE
+ export ES_JAVA_OPTS
+ test -x /usr/share/elasticsearch/bin/elasticsearch
+ case "$1" in
+ '[' -z /usr/lib/jvm/java-7-openjdk-amd64/ ']'
+ '[' -n '' -a -z '' ']'
+ log_daemon_msg 'Starting ElasticSearch Server'
+ '[' -z 'Starting ElasticSearch Server' ']'
+ log_use_fancy_output
+ TPUT=/usr/bin/tput
+ EXPR=/usr/bin/expr
+ '[' -t 1 ']'
+ FANCYTTY=0
+ case "$FANCYTTY" in
+ false
+ echo ' * Starting ElasticSearch Server'
 * Starting ElasticSearch Server
+ COL=
+ start-stop-daemon --test --start --pidfile /var/run/elasticsearch.pid --user elasticsearch --exec /usr/lib/jvm/java-7-openjdk-amd64//bin/java
+ log_progress_msg '(already running)'
+ :
+ log_end_msg 0
+ '[' -z 0 ']'
+ '[' '' ']'
+ '[' 0 -eq 0 ']'
+ echo '   ...done.'
   ...done.
+ return 0
+ exit 0

Grep can't find any mentions of chmod or chown in it, so it doesn't seem like the script is modifying permissions or owner in any way; but I'm not a shell script rock star, maybe I've missed a thing :) Just to be sure, I've reset the initial permissions over /var/elasticsearch to root:root in order to see if the permission change happens, and it's not.

spinscale commented 11 years ago

the output is not from a real startup, because elasticsearch is already running (see the last ten lines). Or at least the pid file is still existing.

oleiade commented 11 years ago

Woops, my bad, sorry :D Actually, once starting correctly with set -x it seems that, yes, it chowns the directories, but not those I've set up in the config.

$ sudo /etc/init.d/elasticsearch &> out
$ cat res | grep "chown"
+ chown elasticsearch:elasticsearch /var/log/elasticsearch /var/lib/elasticsearch /tmp/elasticsearch
+ chown elasticsearch:elasticsearch /var/run/elasticsearch.pid
spinscale commented 11 years ago

just to make sure: Did you set DATA_DIR=/var/elasticsearch in /etc/default/elasticsearch or did you set it in elasticsearch.yml - when reading /etc/default/elasticsearch the default DATA_DIR setup should be overwritten in the init script, what seems not to be the case judging from your last output.

oleiade commented 11 years ago

I've just set it up in elasticsearch.yml, here it is:

### MANAGED BY PUPPET ###
---
cluster: 
  name: pluto
node: 
  name: gesicht
path: 
  data: /var/elasticsearch/data
  logs: /var/log/elasticsearch

and here's my /etc/default/elasticsearch:

# Run ElasticSearch as this user ID and group ID
#ES_USER=elasticsearch
#ES_GROUP=elasticsearch

# Heap Size (defaults to 256m min, 1g max)
#ES_HEAP_SIZE=2g

# Heap new generation
#ES_HEAP_NEWSIZE=

# max direct memory
#ES_DIRECT_SIZE=

# Maximum number of open files, defaults to 65535.
#MAX_OPEN_FILES=65535

# Maximum locked memory size. Set to "unlimited" if you use the
# bootstrap.mlockall option in elasticsearch.yml. You must also set
# ES_HEAP_SIZE.
#MAX_LOCKED_MEMORY=unlimited

# ElasticSearch log directory
#LOG_DIR=/var/log/elasticsearch

# ElasticSearch data directory
#DATA_DIR=/var/lib/elasticsearch

# ElasticSearch work directory
#WORK_DIR=/tmp/elasticsearch

# ElasticSearch configuration directory
#CONF_DIR=/etc/elasticsearch

# ElasticSearch configuration file (elasticsearch.yml)
#CONF_FILE=/etc/elasticsearch/elasticsearch.yml

# Additional Java OPTS
#ES_JAVA_OPTS=

Should I make sure puppet sets DATA_DIR in /etc/default/elasticsearch too then?

spinscale commented 11 years ago

This depends on your setup. If you handle chmod/chown yourself, there is no need for updating /etc/default/elasticsearch - if you want to have it handled by the init script, you should add it there as well and distribute the file correctly.

As the original report is 11 months old and no useful information has been added and your problem looks fixed to me, I will close this one.

haf commented 11 years ago

The problem was probably that the puppet module didn't set the CONF_DIR variable in /etc/{default,sysconf}/elasticsearch to /etc/elasticsearch.


  $service_settings = merge_hashes({
    'CONF_DIR' => '/etc/elasticsearch'
  },$elasticsearch::service_settings)