Shopify / ghostferry

The swiss army knife of live data migrations
https://shopify.github.io/ghostferry
MIT License
693 stars 65 forks source link

Added TotalRows and TotalBytes to the `/status` api for Ghostferry #303

Open EtienneBerubeShopify opened 2 years ago

EtienneBerubeShopify commented 2 years ago

Added TotalRows and TotalBytes to the /status api for Ghostferry for better time estimations when logging with Splunk.

This included getting the data from the information_schema and adding the data into the StateTracker so that it could be serialized later to be sent via HTTP.

DEBATE (DEPRECATED): Should it fetch the data even when running from an interrupted state (reading from a serialized state) or should it only query the data when coming from a "clean" run?

Notes: The Go compiler was complaining about a few things and would not allow me to run any commands before it was happy so I had to fix a few warnings. Here are the files with these "fixes":

shuhaowu commented 2 years ago

This included getting the data from the information_schema and adding the data into the StateTracker so that it could be serialized later to be sent via HTTP.

StateTracker is used only for critical state that is needed to reconstruct Ghostferry after it is interrupted and resumed. Things that can be discovered should not be included in that struct. See the Progress struct.

Manan007224 commented 2 years ago

DEBATE: Should it fetch the data even when running from an interrupted state (reading from a serialized state) or should it only query the data when coming from a "clean" run?

IMO fetching the stats via the /status endpoint should have nothing to do with the fact that ghostferry is running from an interrupted or clean run. So the /status should return these stats regardless.

EtienneBerubeShopify commented 2 years ago

DEBATE: Should it fetch the data even when running from an interrupted state (reading from a serialized state) or should it only query the data when coming from a "clean" run?

IMO fetching the stats via the /status endpoint should have nothing to do with the fact that ghostferry is running from an interrupted or clean run. So the /status should return these stats regardless.

This was an artifact from the first way of doing it, wanted to keep it for the documentation, but I guess it just brings more confusion.