Can't free up space because there's not enough space? ;)

darkpixel commented 9 years ago

One of my nodes is 'low' on space. Down to ~16 GB.

So I tried to run curator to remove older logs and I get the following message:

[2015-01-12 11:53:50,250][INFO ][cluster.metadata         ] [tetrad] [logstash-2014.12.13] deleting index
[2015-01-12 11:53:50,251][DEBUG][action.admin.indices.delete] [tetrad] [logstash-2014.12.13] failed to delete index
java.lang.IllegalStateException: Free bytes [4518450191893] cannot be less than 0 or greater than total bytes [4509977353216]
        at org.elasticsearch.cluster.DiskUsage.<init>(DiskUsage.java:36)
        at org.elasticsearch.cluster.routing.allocation.decider.DiskThresholdDecider.canRemain(DiskThresholdDecider.java:439)
        at org.elasticsearch.cluster.routing.allocation.decider.AllocationDeciders.canRemain(AllocationDeciders.java:105)
        at org.elasticsearch.cluster.routing.allocation.AllocationService.moveShards(AllocationService.java:257)
        at org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:223)
        at org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:160)
        at org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:146)
        at org.elasticsearch.cluster.metadata.MetaDataDeleteIndexService$2.execute(MetaDataDeleteIndexService.java:130)
        at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:329)
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
[2015-01-12 11:53:56,767][INFO ][cluster.routing.allocation.decider] [tetrad] low disk watermark [15%] exceeded on [Pc_MAIWOQVe6qKNtKVIYpw][zefram] free: 16.3gb[14.5%], replicas will not be assigned to this node

The actual error from curator is:

root@tetrad:~# /root/.virtualenvs/curator/bin/curator --host localhost delete --older-than 10;
2015-01-12 11:53:50,238 INFO      Job starting...
2015-01-12 11:53:50,241 INFO      Deleting indices...
Traceback (most recent call last):
  File "/root/.virtualenvs/curator/bin/curator", line 9, in <module>
    load_entry_point('elasticsearch-curator==2.1.1', 'console_scripts', 'curator')()
  File "/root/.virtualenvs/curator/local/lib/python2.7/site-packages/curator/curator_script.py", line 364, in main
    arguments.func(client, **argdict)
  File "/root/.virtualenvs/curator/local/lib/python2.7/site-packages/curator/curator.py", line 1025, in delete
    _op_loop(client, matching_indices, op=delete_index, dry_run=dry_run, **kwargs)
  File "/root/.virtualenvs/curator/local/lib/python2.7/site-packages/curator/curator.py", line 767, in _op_loop
    skipped = op(client, item, **kwargs)
  File "/root/.virtualenvs/curator/local/lib/python2.7/site-packages/curator/curator.py", line 610, in delete_index
    client.indices.delete(index=index_name)
  File "/root/.virtualenvs/curator/local/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 68, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/root/.virtualenvs/curator/local/lib/python2.7/site-packages/elasticsearch/client/indices.py", line 188, in delete
    params=params)
  File "/root/.virtualenvs/curator/local/lib/python2.7/site-packages/elasticsearch/transport.py", line 301, in perform_request
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
  File "/root/.virtualenvs/curator/local/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 82, in perform_request
    self._raise_error(response.status, raw_data)
  File "/root/.virtualenvs/curator/local/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 102, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.TransportError: TransportError(500, u'IllegalStateException[Free bytes [4518450191893] cannot be less than 0 or greater than total bytes [4509977353216]]')
root@tetrad:~#

darkpixel commented 9 years ago

One thing I forgot to mention is that the nodes use ZFS for their storage... Maybe the error about free bytes exceeding total bytes might be somehow related to ZFS compression?

darkpixel commented 9 years ago

The workaround:

stop the node that is low on disk space
run curator to delete older indexes
start the node that is low on disk space
the node will delete its old indexes, freeing up space

dakrone commented 9 years ago

Related to #9249, the workaround there will work for this also until it is fixed.

darkpixel commented 9 years ago

Thanks for the pointer!

elastic / elasticsearch

Can't free up space because there's not enough space? ;) #9260