aws / amazon-mwaa-docker-images

Apache License 2.0
27 stars 11 forks source link

Fix issue 123: Repeated reserialization of DAGs #125

Closed rafidka closed 3 months ago

rafidka commented 3 months ago

Issue #, if available: #124

Description of changes

Issue #124 happens because we are always calling airflow db migrate to make sure the database is migrated. The assumption was that the operation is a no-op if the DB is already initialized. However, this is not the case, and a reserialization of DAGs is happening every time this command is called. To avoid this, I am changing the code for executing migration to check if migration is needed.

Testing

I did multiple tests using the Docker Compose setup. More testing by Amazon MWAA developers before this PR is merged. Suggested testing:

  1. Create an internal 2.9.2 environment and make sure it is functioning normally. Also check ECS logs to ensure there are no issues.
  2. Create a pre-2.9.2 environment and do an in-place upgrade to 2.9.2 and make sure the process is flawless.
  3. Run a 2.9.2 environment with auto scaling enabled (ensuring that workers are being created and terminated continuously) and make sure migrations are not causing any issues.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

jaklan commented 3 months ago

When it's merged, will MWAA 2.9.2-based instances be automatically updated? We are also affected on DEV & UAT environments by https://github.com/apache/airflow/issues/40082 and waiting for the fix before we propagate upgrade from 2.8 to 2.9 on PRD

rafidka commented 3 months ago

When it's merged, will MWAA 2.9.2-based instances be automatically updated? We are also affected on DEV & UAT environments by apache/airflow#40082 and waiting for the fix before we propagate upgrade from 2.8 to 2.9 on PRD

Unfortunately no, you would need to trigger an environment update to have the Docker containers updated. Even a no-op update would do, though, so it should be pretty easy to do.

jaklan commented 3 months ago

@rafidka sure, that's not an issue. Are changes immediately propagated to MWAA after merging PR or is there any build- / deployment-related delay we have to take into consideration before we can trigger an update and test new containers?

rafidka commented 3 months ago

Are changes immediately propagated to MWAA after merging PR or is there any build- / deployment-related delay we have to take into consideration before we can trigger an update and test new containers?

After the PR is merged, we sync it internally and deploy it via internal pipelines where it goes through testing, and then becomes available to different regions around the world gradually. Usually, assuming no test failures or pipeline blockages, this takes a couple of days. I am currently out of office, but @dhegberg should be able to help you with status updates. Daniel, we need to have this PR merged asap, as it is a production issue for our customers.

jaklan commented 3 months ago

PR is merged, so asking for the status update then 😄 Can we proceed with the re-deployment or not yet?

shubham22 commented 3 months ago

@Mercury2699 - can you confirm if this has been deployed to all the waves/regions?

Mercury2699 commented 3 months ago

@jaklan Yes, this fix has been deployed to all regions. Customers can trigger environment update to receive the latest image.