elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.94k stars 24.74k forks source link

DiskUsages with negative total or free space are unserialisable #48413

Open DaveCTurner opened 5 years ago

DaveCTurner commented 5 years ago

Today DiskUsage uses writeVLong() to write its sizes in bytes, so serialising a negative size results in an IllegalStateException.

One possible source of negative sizes was if the DiskThresholdDecider subtracted more relocating shards than the free space of the disk, and this is being addressed in #48392. This was benign since we never serialised the resulting DiskUsage objects. However according to DiskUsageTests we can get negative sizes in other ways too. I'm not sure this is true in Elasticsearch today, but it warrants a more careful investigation. Maybe we can fix the tests and then add assertions that the sizes are always non-negative, or maybe we need to fix the serialisation to deal with negative sizes.

elasticmachine commented 5 years ago

Pinging @elastic/es-core-features (:Core/Features/Stats)

RicardoGralhoz commented 3 years ago

Hey Dave,

I found this related exception on v7.8.1 when running the diagnostics tool to try to debug an unassigned shard issue with underlying CircuitBreakingException (indices:data/write/bulk). Restarting the cluster fixed the original issue, but I'm monitoring it for it seems recurrent.

Would you like me to share the support diagnostics file or should I open a new discussion on the forum, instead? Thanks!

cluster_stats.json :

Bad Request. Rejected
{
  "error" : { 
    "root_cause" : [ 
      {   
        "type" : "illegal_argument_exception",
        "reason" : "Values less than -1 bytes are not supported: -9223372036787056125b"
      }   
    ],  
    "type" : "illegal_argument_exception",
    "reason" : "Values less than -1 bytes are not supported: -9223372036787056125b",
    "suppressed" : [ 
      {   
        "type" : "illegal_state_exception",
        "reason" : "Failed to close the XContentBuilder",
        "caused_by" : { 
          "type" : "i_o_exception",
          "reason" : "Unclosed object or array found"
        }
      }   
    ]   
  },  
  "status" : 400 
}
elasticsearchmachine commented 1 year ago

Pinging @elastic/es-data-management (Team:Data Management)