TomonoriSoejima / Tejun

notes related to working cases
5 stars 3 forks source link

recovery fail error 1 in 7.4.1 #27

Open TomonoriSoejima opened 4 years ago

TomonoriSoejima commented 4 years ago

[2020-06-24T14:10:02,478][WARN ][o.e.c.r.a.AllocationService] [xxx] failing shard [failed shard, shard [index-2020.06.20-000115][7], node[aaa], relocating [BVBS4C1uR0SRCGBEGWTTPg], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=oeYkS4F6QjmWkRBvo-x4Gw, rId=ByuTpxo_S5-zXnVH1QvAIg], expected_shard_size[38786827396], message [failed recovery], failure [RecoveryFailedException[[ipfixdata-2020.06.20-000115][7]: Recovery failed from {ccc}{t1vqzdlNQnyCSolqHhYFAw}{i1-SgYkTTgyn_kFuxG-dkg}{172.25.178.85}{172.25.178.85:9301}{dl}{ml.machine_memory=540447649792, rack=r3, ml.max_open_jobs=20, xpack.installed=true} into {aaa}{DQs9Qvg2TYa__C6_l924MQ}{5oIy4EgJQ0OIWiA0A9mN5A}{172.19.226.76}{172.19.226.76:9301}{dl}{ml.machine_memory=540447649792, rack=r8, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[ccc][172.25.178.85:9301][internal:index/shard/recovery/start_recovery]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [30665603222/28.5gb], which is larger than the limit of [30079536332/28gb], real usage: [30664554016/28.5gb], new bytes reserved: [1049206/1mb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=2098436/2mb, accounting=0/0b]]; ], markAsStale [true]]

TomonoriSoejima commented 4 years ago
[failed shard, shard [index-2020.06.20-000115][7], node[aaa], relocating [BVBS4C1uR0SRCGBEGWTTPg], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=oeYkS4F6QjmWkRBvo-x4Gw, rId=ByuTpxo_S5-zXnVH1QvAIg], expected_shard_size[38786827396], message [failed recovery], failure [RecoveryFailedException[[ipfixdata-2020.06.20-000115][7]: Recovery failed from {ccc}{t1vqzdlNQnyCSolqHhYFAw}{i1-SgYkTTgyn_kFuxG-dkg}{172.25.178.85}{172.25.178.85:9301}{dl}{ml.machine_memory=540447649792, rack=r3, ml.max_open_jobs=20, xpack.installed=true} into {aaa}{DQs9Qvg2TYa__C6_l924MQ}{5oIy4EgJQ0OIWiA0A9mN5A}{172.19.226.76}{172.19.226.76:9301}{dl}{ml.machine_memory=540447649792, rack=r8, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[ccc][172.25.178.85:9301][internal:index/shard/recovery/start_recovery]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [30665603222/28.5gb], which is larger than the limit of [30079536332/28gb], real usage: [30664554016/28.5gb], new bytes reserved: [1049206/1mb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=2098436/2mb, accounting=0/0b]]; ], markAsStale [true]]

The message above is constructed from

https://github.com/elastic/elasticsearch/blob/v7.4.1/server/src/main/java/org/elasticsearch/cluster/routing/allocation/FailedShard.java#L44


    @Override
    public String toString() {
        return "failed shard, shard " + routingEntry + ", message [" + message + "], failure [" +
                   ExceptionsHelper.detailedMessage(failure) + "], markAsStale [" + markAsStale + "]";
    }

routingEntry

[index-2020.06.20-000115][7], node[aaa], relocating [BVBS4C1uR0SRCGBEGWTTPg], [R], 
recovery_source[peer recovery], 
s[INITIALIZING], a[id=oeYkS4F6QjmWkRBvo-x4Gw, rId=ByuTpxo_S5-zXnVH1QvAIg], expected_shard_size[38786827396]

breakdown of routingEntry

/**
 * Represents the recovery source of a shard. Available recovery types are:
 *
 * - {@link EmptyStoreRecoverySource} recovery from an empty store
 * - {@link ExistingStoreRecoverySource} recovery from an existing store
 * - {@link PeerRecoverySource} recovery from a primary on another node
 * - {@link SnapshotRecoverySource} recovery from a snapshot
 * - {@link LocalShardsRecoverySource} recovery from other shards of another index on the same node
 */

message

breakdown of message

https://github.com/elastic/elasticsearch/blob/v7.4.1/server/src/main/java/org/elasticsearch/cluster/action/shard/ShardStateAction.java#L392

ExceptionsHelper.detailedMessage(failure)

breakdown of failure [" + ExceptionsHelper.detailedMessage(failure) + "]

https://github.com/elastic/elasticsearch/blob/v7.4.1/server/src/main/java/org/elasticsearch/ExceptionsHelper.java#L106