apache / accumulo

Apache Accumulo
https://accumulo.apache.org
Apache License 2.0
1.07k stars 445 forks source link

Could add stateful checks to WAL recovery code #542

Open keith-turner opened 6 years ago

keith-turner commented 6 years ago

Data is written to WALs in temporal order. Mutations are written to a WAL with per tablet sequence numbers. The sequence numbers do not change until a minor compaction occurs. The fact of a minor compaction is recorded in the WAL.

Below is an example of a WAL in the order it was written with the following explanation of the contents.

DEFINE_TABLET 5 1 2<<

MANY_MUTATIONS 5 1
1 mutations:
  r1
      f1:q1 [system]:1529685833137 [] v1

MANY_MUTATIONS 5 1
1 mutations:
  r1
      f1:q2 [system]:1529685833149 [] v2

COMPACTION_START 5 2 hdfs://localhost:8020/accumulo/tables/2/default_tablet/F0000005.rf

COMPACTION_FINISH 5 3

MANY_MUTATIONS 5 3
1 mutations:
  r1
      f1:q1 [system]:1529685849576 [] v3

COMPACTION_START 5 4 hdfs://localhost:8020/accumulo/tables/2/default_tablet/F0000006.rf

COMPACTION_FINISH 5 5

MANY_MUTATIONS 5 5
1 mutations:
  r1
      f1:q1 [system]:1529685856321 [] v4

MANY_MUTATIONS 5 5
1 mutations:
  r1
      f1:q2 [system]:1529685867727 [] v5

Given the example above it would be odd to see mutations in a WAL with sequence numbers X,X+2, and X+4 without seeing corresponding compaction events between the mutations. So we could add two types of sanity checks to the recovery code :

The retry behavior when writing to WALs and its efect on seq numbers, if any, needs to be looked into.

keith-turner commented 6 years ago

I thought of this while working on #538 and #541. If something like this were added it would be nice to also add user level tools like mentioned in #535 to help users figure things out if these stateful checks trigger.