Shopify / ghostferry

The swiss army knife of live data migrations
https://shopify.github.io/ghostferry
MIT License
693 stars 65 forks source link

Trouble with virtual generated columns #338

Open SpencerMalone opened 1 year ago

SpencerMalone commented 1 year ago

👋 We have a virtual generated column, but it seems like ghostferry is trying to insert data into the virtual generated column when it should not, resulting in an error like...

during prepare query near paginationKey <redacted>: Error 3105: The value specified for generated column '<redacted>' in table '<redacted>' is not allowed.

Seems like we should just not insert any data into virtual generated columns

shuhaowu commented 1 year ago

You might be able to ignore the column in this function by removing the virtual columns from TableSchema.Table.Columns: https://github.com/Shopify/ghostferry/blob/b3aaacf5d6ce5f14545befc4fe2b06c93e00c04e/table_schema_cache.go#L205

SpencerMalone commented 1 year ago

That's a good thought, lemme pull on that thread locally and check back in!

SpencerMalone commented 1 year ago

Sorry, I'm still struggling with this a bit, but hopefully will have an update in a few weeks!

SpencerMalone commented 1 year ago

Ignoring in the table schema ended up being a bit of a dead end, it made the DML stuff pretty unhappy, in the end we settled with maintaining a ColumnsToSkip list in type TableSchema struct, and using that to rebuild ColumnsToSelect in the cursor, and simply skipping the rows in the DML events. Would y'all be interested in a PR?

milanatshopify commented 3 months ago

Would y'all be interested in a PR?

Always interested in PRs - did you have some work in progress?

SpencerMalone commented 2 months ago

I can try to untangle our change to get it into the upstream, we've had it in production for ~a year now, but our fork is so divergent it may not happen at this point D: