airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.47k stars 3.99k forks source link

Improve ease of running migration script #2581

Closed ChristopheDuong closed 3 years ago

ChristopheDuong commented 3 years ago

Tell us about the problem you're trying to solve

When we run the migration script, we have to specify airbyte versions twice:

For example, from https://github.com/airbytehq/airbyte/issues/2578: docker run --rm -v /home/cooper:/config airbyte/migration:0.17.2-alpha -- --input /config/airbyte_archive.gz --output ./airbyte_migrated.tar.gz --target-version 0.17.2-alpha

However, the latest target migration version was actually 0.17.0-alpha (with some patch to make it work better in 0.17.1-alpha though) so the migration failed.

The bumpversion is automatically upgrading the version in the doc page at both places, so we ended up in a state where that command didn't work and the correct command would have been: docker run --rm -v /home/cooper:/config airbyte/migration:0.17.2-alpha -- --input /config/airbyte_archive.gz --output ./airbyte_migrated.tar.gz --target-version 0.17.0-alpha

Describe the solution you’d like

  1. Can we make a NoOpMigration with the latest target version that would always be the last migration to apply in the migration list? That way, the migration script to execute could always be: docker run --rm -v /home/cooper:/config airbyte/migration:0.17.2-alpha -- --input /config/airbyte_archive.gz --output ./airbyte_migrated.tar.gz --target-version latest
  2. or make the --target-version optional and if it's empty, it would automatically select the last migration's version in the migration list?

Describe the alternative you’ve considered or used

Not auto bump versions in the docs and let the users figure out which versions.

  1. They would need to run the migration script while figuring out the proper versions for docker-images and target versions (probably not a very good solution?).
  2. So when we write a new migration or make a release, we should always update docs to the proper versions manually?

Additional context

Another user encountering same confusion here: https://airbytehq.slack.com/archives/C01MFR03D5W/p1616332978034800

cgardens commented 3 years ago

However, the latest target migration version was actually 0.17.0-alpha (with some patch to make it work better in 0.17.1-alpha though) so the migration failed.

I don't understand the confusion or what was wrong. Can you share the command that someone was using that was wrong? Or were our docs showing a command that was wrong?

cgardens commented 3 years ago

Oh, is it that you can't specify a patch version as the target in the migration script?

ChristopheDuong commented 3 years ago

Oh, is it that you can't specify a patch version as the target in the migration script?

You have to provide the exact version of the last migration class in the code

In the description of my issue or the linked one (or the slack thread) there are traces of commands ran by users and the resulting exception stack traces

cgardens commented 3 years ago

ok. I think just defaulting to latest is fine. In addition we do the same thing as we do for the current version, where we search backwards from the version we receive to find the last migration and then start from there. Maybe not worth doing now.