Open chirichidi opened 4 months ago
I am curious to know how syncing DAGs from S3 work? do we need to create a kubectl secret with our AWS key and secret key, and how often will it poll for new DAG files/ folders?
This would be useful to have in the Helm chart. Git sync is sometime not the best option.
just another bump on this PR. This seems like the best option for AirFlow deployments in EKS.
And just to clarify about AWS credentials, in general we would be using IAM roles rather than user credentials, so there should be no need for additional k8s secrets, or anything like that.
@chirichidi thanks for the very interesting PR, I would love to get "s3-sync" as a concept into the chart (as it will help users migrate from MWAA).
The main thing we need to finalize is the "reconciliation loop", everything else is secondary and can be updated later.
If I understand your PR correctly, you have done the following:
init-container
which runs the following command (to populate the dags folder as the pod starts)
aws s3 cp --recursive s3://<BUCKET>/<PATH> ./path/to/dags
container
which runs the following command on loop (to keep the dags folder up to date):
aws s3 sync --delete s3://<BUCKET>/<PATH> ./path/to/dags
My main concerns are:
aws s3 sync
for the init-container?
aws sync
parameters:
--quiet
--only-show-errors
Are there any other things I have missed?
PS: if/when we merge this, I will update the values/docs in your PR to match the style of the chart.
What issues does your PR fix?
What does your PR do?
Overview
This Pull Request introduces a new feature,
s3Sync
, designed to enhance our application's ability to synchronize data with AWS S3. This addition aims to provide a more robust and flexible solution for managing cloud storage synchronization tasks.Details
s3Sync
feature is fully compatible with our existing infrastructure and does not introduce any breaking changes or dependencies.No Changes to Existing
gitSync
Functionalitys3Sync
feature, special care was taken not to modify or affect the existinggitSync
functionality. Our commitment was to add value without disrupting current operations or workflows.gitSync
remains unaffected and operates as expected.Testing and Validation
gitSync
, continue to operate without any issues.Conclusion
This enhancement is a step forward in our ongoing efforts to provide a seamless and powerful toolset for our users. By introducing
s3Sync
, we are expanding our capabilities while ensuring the integrity and performance of our existing features remain intact.I look forward to your feedback and any discussions regarding this PR. Thank you for considering these enhancements.
Checklist
For all Pull Requests
For releasing ONLY