dask / community

For general discussion and community planning. Discussion issues welcome.
20 stars 3 forks source link

Planned removal of the "daskexecutor" provider in Airflow #355

Closed potiuk closed 10 months ago

potiuk commented 10 months ago

I would like to let the Dask community know that Airflow community is about to start voting of the removal of “daskexecutor” provider from active maintenance by the community.

The Dask Executor has been added to Airflow years ago and since then it’s been several times we discussed removing it because we have pretty much no experinece of Dask, the usage of Dask Executor was (very likely) extremely low (We have Local, Celery, Kubernetes, CeleryKubernetes and in a very near future we will get some cloud native executor like Amazon ECS). We have virtually no issues and questions about Dask Executor and we believe the usage is very low. Also it happened in the past and keeps on happening, that Dask tests are flaky and cause our test harness failures, also it happened in the past that new dask releases broke our tests and we had no expertise to fix them - so we had to reach to the community here. You can find some past threads about it and some historical discussions about it can be found in this thread on Airflow devlist:

https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1

We discussed it before, but the process of removal was not easy/straightforward, but the situation has recently has changed:

Dask Executor moved from the core to the new “daskexecutor provider” because we implemented a clear Executor API that allowed the executor to be cleanly separated

We introduced a complete lifecycle process description for our providers that introduced a process and clarifies how we can remove providers, describe the consequences of it and we also have tested and went through it by removing Qubole provider from active maintenance (Qubole service have been discontinued).

This opened a way to daskexecutor provider removal. Initial discussion about that (see the link above) indicates that removal of daskecutor is preferred next step unless someone in the Dask community steps up and commits to maintain it.

Also that allowed us to get some stats on the usage of dask executor - from pypi download stats it seems that for more than 250.000 daily downloads of Airlfow 2.7+ (which is where the split happened) we have less than 1000 downloads of daskexecutor provider a day. This indicates that even downloading daskexecutor happens in less that 0.3% cases (3 promiles) cases of downloading Airflow (and we do not know how much of that is actually “using” it because there might be people who use “all” extra on Airflow which will download all providers - but it does not mean they are used). This confirms our anecdotal experience that the executor is hardly used. With other Executors our users have viable alternatives and they might still continue using released provider for a long time. It will just stop being maintained.

There are three possible options how this will go further:

(most likely) - we remove the provider. It won’t be maintained any more. Any future releases of dask might break it, we will remove the code and tests from main so we will not ba actively testing it any more. The old providers will remain in pip and we are committed to fix any security issues reported to us for it, but no new releases will happen. Also dependencies of Dask will be removed from airfflow dependencies and image - which means that for future versions of Airflow might not allow to install and use the executor. They might or might not work but we will not check it.

Someone in the Dask community steps up and commits to keep the dask executor up to date. There is a way how to show that commitment is real - currently some of the Dask tests are flaky and we quarantined them, so if someone in the Dask community would like to step-up - diagnosing, fixing and committing to fixing them would be a pre-requisite for us to consider this a viable option. The issue is here: Flaky dask backfill test in quarantine · Issue #32778 · apache/airflow · GitHub - and it’s been appearing on/off for last year or so - we recently re-opened it as it started to affect main builds again after we sped-up our test harness.

This is actually a next step after 1) is complete - the Dask community might fork the provider and release and maintain their own. We are actually encouraging communities and compoanies behind less-used providers to release their own providers. 3rd-party providers are first-class citizens for Airflow and there are some external registries where they can be discovered and it’s a viable option for Dask community to maintain such a provider if they wish. We are happy to provide guidance and documentation links for anyone in the Dask community who would like to do it.

I plan to start voting in about a week time - just wanted to let the Dask community know that it is hapening and give you some time to see if you maybe would like to pursue the option 2) above. In which case it should be enough time for someone to step-up and take the leadership on fixing the issue I mentioned.

Please don’t treat as a hostile movement - it’s just a rational decision based on the fact we know and problems we experience with it. We appreciate what Dask community is doing, it’s just for us maintaining it is more trouble than benefits for our community as a whole.

I hope you understand it and make the righ call on your side - whether the Dask community steps up to maintain the executor/own provider or not.

Representing the Airflow Community,

mrocklin commented 10 months ago

Hi @potiuk !

As another project that feels the inertia of lots of connections with other open source projects I totally empathize with Airflow's situation. From my perspective you should totally rip out the DaskExecutor from the AirFlow codebase. It's not something that I personally see used in practice, and so certainly not something that should weigh down the Airflow core maintainers. If I could vote in Airflow I'd vote to remove.

Thanks for spending the energy to communicate. I appreciate it.

Cheers, -matt!

potiuk commented 10 months ago

Thanks for the kind words and empathy @mrocklin and others for reactions. Much appreciated.

Seeing that and the overall reactions - I just started formal lazy consensus thread today https://lists.apache.org/thread/fxv44cqqljrrhll3fdpdgc9h9fz5ghcy and we will continue with removal process once reached (Tuesday next week). If it goes as planned, the next wave of providers (in ~ 2 weeks) will have the last release of the provider with the "removed from maintenance" warning as part of PyPI description and documenation update.

Let me close the issue now, we can always re-open it if anything changes.