brandur / sorg

A Go-based static site generator that compiles brandur.org.
MIT License
488 stars 164 forks source link

HTTP transactions: questions on performance and correctness #261

Open chuzhe-as-a-dev opened 4 years ago

chuzhe-as-a-dev commented 4 years ago

Hi Brandur,

I have just read about your wonderful articles on making HTTP requests transactional and idempotent. I have two questions on this topic and hope to hear your thoughts.

  1. Isn’t it costly to use a single serializable transaction for the whole API?

I see that each transaction is explicitly set to use the serializable isolation level in the sample code. AFAIK, serializable requires much more coordination cost than weaker isolation levels, such as snapshot isolation. And almost all RDBMS use weaker isolation level as default, in my opinion, mostly due to the performance cost. I understand that weaker isolation is hard to get right. But if applications need the performance, what are the options for them?

  1. When dealing with foreign state mutations, does it matter that chopping the API into several transactions would make intermediate states visible to other APIs?

The intermediate states can be leaked as soon as a transaction commits, which may lead to unexpected app behaviors. For example, API A has its intermediate states exposed to API B, and API B considers some actions that are already taken. However, if API A then fails halfway through and the client never retries, then the assumed action will never complete, and API B might be doing something it shouldn’t do.

Please let me know your takes. Thanks!

brandur commented 4 years ago

Hey @TerCZ, thanks for the kind words and sorry for the delayed response here!

Isn’t it costly to use a single serializable transaction for the whole API?

Yeah, is it more expensive, and it might be a good idea to design your code/DB to not be entirely dependent on SERIALIZABLE from the beginning so that it's not a problem later when you want to downgrade. I did get an email from one reader who tried using SERIALIZABLE everywhere and eventually did run into contention issues.

The good news is that you can get most of the useful guarantees of SERIALIZABLE at lower levels as well using other mechanisms. The easy example that comes to mind is if two transactions try to both read-then-write the same row and end up doing a double-insert. SERIALIZABLE makes this impossible, but a simple UNIQUE index pretty much solves the problem too, and for a program that's expecting to get a lot of traffic, the latter should probably be used.

There's also the possibility that you could keep SERIALIZABLE for your most important paths, and use lower isolation elsewhere.

I should amend the article to make this more clear, but I don't yet have fully realized advice on what people should do.

When dealing with foreign state mutations, does it matter that chopping the API into several transactions would make intermediate states visible to other APIs?

For sure. In practice, you'll probably need to add flags to rows that may represent partial state like is_complete or is_in_progress and make sure that your program knows to take it into account when it's trying to look for only fully finished results.