Closed shubham22 closed 1 year ago
Cc: @rbiegacz
First of all, it's rather cool you are asking for it @shubham22 :). Also cc: @filipknapik from Google.
From the legal point of view, the licence of the project (Apache 2.0) alllows anyone to do anything with the code - so you are free to do what you plan. No-one can prevent that. And reaching out here before publishing it is a sign of good will and intentions :)
However maybe that is a good oportunity to do a joint effort with Google team and possibly make a common repo where both Composer and MWAA approach would be working.
This tool was developed for Airlfow 1 and it has not been touched since. Also it's support for current Dataproc would need refrehsing. It also did not have a high success rate (as an open-source publicly project with the hope of building some commuinity). At least judging from a number of isssues, community comments etc. I do not even remember any single user who would ask anything about the tool. And It had a number of "not yet implemented" features (as documented in the open issues). Hewever likely some corporate customers have used it maybe even as inspiration to their own migration.
But maybe a joint effort from both companies might be the right approach and putting some "life" and freshness into the project and especially making it Airlfow 2 compatible might be a good idea. And maybe others might be interested @vikramkoka from Astronomer, @alexott from Databricks. Also I spoke about it long time ago at ApacheCon with @gezapeti - the PMC member of Oozie who was at the time in Cloudera - and I know Cloudera was also potentially interesed in migrating out all old Oozie workflows (which used to be default way for Hortonworks/Cloudera years ago and likely some of those are still there). Geza is no longer in Cloudera, but maybe he can give some insights there and point us to the right person there.
I have no idea how many those "Oozie" workflows are still out there. And our goal should be to help to drive it to 0 I think. As @turbaszek once mentioned in our presentation (quoting from memory) - "This tool allows to make the world a better place -one XML less at a time".
I looked at the board reports and acitivity in Oozie and by all accounts it seems that the activity there asymptotically goes to 0, so sonner or later it might end up in Apache Attic. Not yet there, the project is formally active - but it is rather low activiity. I expect sooner rather than later such a move might happen.
And having such a tool vendor-neutral (especially if it is cloud-agnostic and allows to transition to muliple clouds) might be also attractive way for the Oozie team to ease their "moving to Attic" effort. Oozie team could recommend it for people who still use it, which would likely make it far more successful. BTW. This was actualy the original idea for o2a to become vendor-neutral if it catches-up.
Sounds like it could be good for the whole community if we could pull something like that by a joint effort.
It might be dificult to coordinate and add governance on - but maybe it's worth considering.
One more thought - I think an easy way might be wventually -> if we have enough commitment from multiple companies to maintain it at least for a while and if oozie team would support that as "way to drive the attic move", if such tool will be vendor-neutral, we could attempt to donate it to apache-airflow to make governance easy. The success is not guaranteed - a number of people from the PMC would have to agree to it, but I think it is viable option.
@potiuk - as always, appreciate your thorough and thoughtful reply.
TL;DR I agree with your approach of making o2a vendor neutral (support multiple clouds) and have multiple companies coming together to keep the project alive and fresh until there are 0 Oozie workflows left.
good opportunity to do a joint effort with Google team and possibly make a common repo where both Composer and MWAA approach would be working.
I am personally very supportive of this. Initial feedback from other AWS folks today has also been positive. Before I can completely commit to this, I need to follow some internal approval processes for creating a new OSS project, which I can kick off next week. I also need to make sure that there is a commitment from our (AWS) end to continue giving resources to it in the future. Meanwhile, I would wait to hear what folks at Google, Astronomer, et al. think about this.
likely some corporate customers have used it maybe even as inspiration to their own migration.
I can attest that at least 3-4 customers have benefited from this. We are actually looking to pair the release of AWS-compatible version with a customer reference to add a testimonial on how it helped them.
"This tool allows to make the world a better place -one XML less at a time".
Well said! This should the headline of this tool :)
Thanks for introducing this OSS newbie to “Apache Attic” and for providing a lot of helpful context. In this scenario, if and once we have multiple companies committing to maintaining this tool on this thread, what would be the best way to align with the Oozie team?
we could attempt to donate it to apache-airflow to make governance easy
TBH this would be the most ideal scenario as people involved in maintaining this tool would have high overlap, if not 100%, with the Apache Airflow community.
My personal wishful thinking is not to stop at o2a, but to create and open-source conversion tools from other popular legacy workflow applications to Apache Airflow. Still working on logistics and alignment on this at AWS internally.
I agree that having this tooling be migrated/donated to apache-airflow would make the governance at lot easier and would provide a neutral place for the code to live. I wonder if these approaches are mutually exclusive though. We, as AWS, could publish our fork of this code which works for our customers on the short term, while we work with the rest of the community and other cloud providers (if they're interested) to create a vendor neutral version since I reckon that will be a much more slowly moving process. Thoughts?
@rbiegacz @filipknapik ? WDYT?
Given that this is not of top priority for others at this point in time, I'm inclined to go with @o-nikolas's suggestion. How about we (AWS) publish our fork of this code for our customers for now? If there is enough interest from Google et al. in the future, we can always work on a shared repository to host a vendor neutral version.
Fine for me if @rbiegacz @filipknapik do not want to follow up on it :D
Context This Oozie-to-Airflow migration tool makes it easy to convert Oozie workflows to Apache Airflow workflows that can be run directly on the GCP. Customers using AWS for their cloud execution environment are also interested in similar AWS-compatible tool to make their migration to the Apache Airflow easier.
Current Status Based on this tool, @dgghosalaws et al. have written and tested a conversion tool to output Apache Airflow workflows that can be directly run on the AWS cloud. The primary change was using Amazon EMR instead of Dataproc, and Amazon MWAA instead of Cloud Composer.
Ask/Question AWS team is looking to open-source AWS-compatible conversion tool so that more customers can benefit from it and use it to migrate to Apache Airflow. The tool will (likely) be hosted in our Apache Airflow-focused repo along with migration tools (WIP) from other workflow orchestrators. All the credits will be given to this project as we have derived from it. We would like to know if the authors of this project have any concerns about the plan. cc: @potiuk @turbaszek @mik-laj