aelzeiny / airflow-aws-executors

Airflow Executor for both AWS ECS & AWS Fargate
MIT License
53 stars 9 forks source link

Contributing Executors to Amazon Provider Package for Apache Airflow #24

Closed o-nikolas closed 1 year ago

o-nikolas commented 1 year ago

Hey @aelzeiny!

tl;dr:

I'm reaching out to see if you're interested in contributing these executors to the Amazon Provider Package.

Context:

You and I along with @potiuk actually met and chatted a while back and some things have changed in Airflow since then: AIP-51 was proposed and is mostly completed. Meaning it is now easier than ever to create and maintain 3rd party Airflow Executors. There is also an approved proposal to move the "core" Airflow Executors (e.g. Celery, Kubernetes and their hybrids) into provider packages themselves (discussion thread, and vote results). This now sets a good precedent for the inclusion of Executors in Provider Packages.

With the above two items in place, the AWS OSS Airflow team is preparing to start implementing some AWS Executors, many of which overlap with what you have here. I know you previously attempted to submit these executors to Airflow core (PR link), which was not approved/merged, so I'm wondering to what degree you're interested in and are available to open a new PR to have these merged into the AWS Provider Package? We'd love to have you as a collaborator during this process! Let me know what you think :smiley:

aelzeiny commented 1 year ago

Hey o-nikolas,

Thanks for reaching out. I remember that conversation, and would be happy to move this package to its rightful place! Honestly, it would be a relief to know that this project would be better-maintained. What can we do to make this happen?

Best Regards, Ahmed

o-nikolas commented 1 year ago

Hey o-nikolas,

Thanks for reaching out. I remember that conversation, and would be happy to move this package to its rightful place! Honestly, it would be a relief to know that this project would be better-maintained.

That's great news! I'm looking forward to that collaboration :smiley:

What can we do to make this happen?

I first need to build consensus with folks in the Airflow community on how exactly the executors in providers should look e.g.: the directory structure, how configuration is provided, where documentation should live, any boilerplate coded needed, etc. Since I want to ensure we're not implementing a divergent approach. But this could be done in parallel.

I'd like to jump on a call to hash things out with you on how to contribute what you have now, I have some options in mind we can discuss. @shubham22 will likely want to join as well. I'll send an invite to the email you have configured in Github if that is a good place to reach you at (or let me know if there is another better option).

o-nikolas commented 1 year ago

Hey @aelzeiny !

I tried to schedule a Meet call for us to chat last week, but I don't think you received it, please let me know if there is a better email to reach you at.

For the time being I'll outline a plan here and we can then follow-up on another Meet (see my profile for an email address) or just continue to discuss here, whichever you prefer!

Context:

Myself and the rest of the AWS Airflow OSS team do our development out of a team fork (instead of us all using individual forks). This allows us to more easily collaborate on code (we can just checkout each others branches) as well as run some CI attached to the fork. NOTE: This is not a hard fork of Airflow, we don't release any code from this repo or merge any changes to main (in fact we have automation that will stop or overwrite the changes if we try). We simply use this fork for development in dev branches which we then open as PRs to upstream apache:airflow.

The Plan

I have created a new development branch on that fork, aws_executors/ecs_batch, for this project of introducing the AWS executors. I believe the easiest approach, which will burn the least of your time, will be for you to create a PR against this branch to contribute the relevant code* (more on that later) from your package as a starting point. We can then merge that PR into the development branch and then take over from there making any further commits (updates/bugfixes) or modifications required to get it up to a spec for the AWS provider package (updating docs, tests, etc) and then shepherding the final PR to upstream airflow from this dev branch. This way Github will track you as a co-author on the eventual upstream PR to apache:airflow, but it will not take much of your time personal time or iteration.

Let me know what you think of this approach!

*Files of interest from this package:

I'd like to hear your thoughts on what makes sense to contribute, since not all files in this package will be required or relevant. Some obvious examples of files that will not be needed are things related to install and packaging (e.g. setup.cfg, setup.py, pylint, deploy.sh, etc) and the github workflows files. But I'm less sure about something like the batch_ami_helper.py, I'm leaning towards no for this one, but I'd like to hear from you. Otherwise, I think the code in tests/ and airflow_aws_executors/ are the main bits we'd like to transfer. Let me know what you think!

aelzeiny commented 1 year ago

Hey, sorry I missed your communication. I just responded. Be on the lookout for and email address following $firstname.$lastname@gmail. Thanks!

o-nikolas commented 1 year ago

Ahmed has contributed the executors and we're working on getting the code ready to merge into Airflow Amazon provider package soon! Closing this issue as it's main purpose is now complete. Thanks again @aelzeiny, looking forward to collaborating in the future!