OctopusDeploy / Issues

| Public | Bug reports and known issues for Octopus Deploy and all related tools
https://octopus.com
161 stars 20 forks source link

Migrator is importing NotFound events and then breaks auto-deploy #3351

Open MarkSiedle opened 7 years ago

MarkSiedle commented 7 years ago

If you use migrator (or other tools we have for importing data to Octopus presumably, Octo.exe?), the data in your event table can get malformed if resources that existed at the time those events were created have since been deleted.

Example and steps to reproduce

Eg. Strap yourselves in, this is about to get interesting... :)

Think of the lifecycle of a machine. First we get a MachineAdded event. This machine lives for a while, then it gets deleted, and we get a MachineDeleted event. At this point, the machine is also deleted from the Machine table. Now you move your server and decide to use Migrator to export/import. When importing events, the migrator is unable to find that machine that those earlier machine-related events belonged to, so they continue to import the event record, but leave an empty RelatedDocumentIds column and also adds NotFound to the JSON. This can then cause problems with anything relying on our event-sourcing concept (auto-deploy, subscriptions), because they may be expecting those RelatedDocumentIds.

This screenshot sums up the sort of situations your data can get into:

missing-machine-event-data

Possible fixes

We could just fix this problem in Migrator and Octo.exe (unconfirmed this occurs in Octo, but I assume it might?) and not import events if RelatedDocumentIds cannot be found at the time of import. We could also choose to not export events with missing references.

Once someone's data is in this state, they have to run a DELETE statement in SQL to fix their event-stream, so it might be pertinent to update the auto-deploy code to detect missing RelatedDocumentIds and not throw like it does currently :) The current known problem in auto-deploy is here where we (the royal we ...meaning me) assumed that there will always be a single machine for a machine-related event. It would be nice if this assumption was true, but if the data is borked, it should fail more gracefully :)

Things to consider

There may also be a valid reason why we import those events without valid referenced documents (maybe for historic purposes). If this is the case, then we can leave Migrator alone and just make fixes to auto-deploy (in which case maybe this belongs with the cloud team, and not the scale team? Up for discussion).

Source: https://secure.helpscout.net/conversation/339842461/14328/?folderId=801354

MJRichardson commented 6 years ago

Migrator is importing NotFound events and then breaks auto-deploy