Shopify / ghostferry

The swiss army knife of live data migrations
https://shopify.github.io/ghostferry
MIT License
747 stars 70 forks source link

Some TLA+ suggestions #14

Open hwayne opened 6 years ago

hwayne commented 6 years ago

I ran across your spec and thought I'd give some tips on using TLA+ here!

Multiple Initial Starting States

SourceTable = InitialTable means you have to create a separate model for each different initial state. What we could instead do is say

CONSTANT TableCapacity
InitialTables == [1..TableCapacity -> PossibleRecords]

(*--algorithm
variables SourceTable \in InitialTables;

This sets the initial SourceTable to some possible InitialTable, meaning that a single model can now explore every single table of size TableCapacity. This also has a performance bonus, since by wrapping everything in a single model TLC can skip checking symmetric states.

If you don't need to account for gaps in your table, IE NoRecordHere isn't part of your specs then we can replace our InitialTables with

InitialTables == UNION {[1..tc -> Records]: tc \in 0..TableCapacity}

This now generates all tables of size TableCapacity or less. This means instead of writing <<r0, r1, NoRecordHere>> we write <<r0, r1>>.

ASSUME

If you do need NoRecordHere, instead of defining it as something TLC can't check, try doing this:

CONSTANT NoRecord
ASSUME NoRecord \notin Records

That signals the intent more clearly, and also doesn't require you to override the module. Instead you just make NoRecord a model value.

Indenting PlusCal Labels

You current have this:

binlog_loop: while (...) {
      binlog_read:   if (...) {
                       ...
      binlog_write:    ...
      binlog_upkey:    ...
    };
}

It's hard to tell if the labels are sublabels or side-by-side or what. Instead, try this:

binlog_loop: 
  while (...) {
    binlog_read:   
      if (...) {
        ...
        binlog_write: 
          ...
        binlog_upkey:    
          ...
    };
}

Now it's clear that binlog_write and binlog_upkey are siblings under binlog_read.

Hope these tips help!

shuhaowu commented 6 years ago

Thanks for the tips! This is the first real TLA+ spec I wrote so I appreciate any tips. I'll take a look at running this at some point in the near future.

I'm not sure if I can discount holes in the database, I suspect I can but I don't know how to prove it.

About the labels, I found it more clear this way as then all the code is lined up and indented similar to how actual code would be. I can then trace each label back to the code, although I'm still not quite sure what style I should adopt.

hwayne commented 6 years ago

Happy to help!

I'm not sure if I can discount holes in the database, I suspect I can but I don't know how to prove it.

One thing you can do is create separate models, one where you have holes and one where you don't. Since the latter would have one fewer "record", you can check it over a larger set of inputs in the same amount of time. You'd check the former with a smaller model and the latter with a larger one.

If you go this route, you could make a CONSTANT TableGenerator(_) operator and choose in the model whether it generates all length-n tuples of PossibleRecords or all possible tuples up to length-n.

Please let me know if you have any questions!