Closed potiuk closed 3 years ago
you should add an appropriate extra.
I am concerned that this is a good idea. I think it would be worthwhile for the user to pin a specific version so that they do not accidentally install a newer version that may contain regressions.
I think the http should be part of core, see discussion in https://github.com/apache/airflow/pull/12252
http (& even ftp) does seem like they should be part of core. Atleast for HTTP it uses all the internal hooks or requirements that are part of Airflow core's requirement too.
The following should require explicitly installing them:
"apache.pig": [], "apache.sqoop": [], "dingding": [], "discord": [], "openfaas": [], "opsgenie": [], "sqlite": [],
Absolutely agree that http should be part of core. Strongly in favor of ftp as well being part of core, assuming no additional dependencies. Tempted with imap, but unsure on the dependencies.
Nothing else comes close IMHO
i like adding imap
-- essentially we're saying lower-level protocols are core (ftp, http) so imap fits into that list
The following should require explicitly installing them:
"apache.pig": [], "apache.sqoop": [], "dingding": [], "discord": [], "openfaas": [], "opsgenie": [], "sqlite": [],
I agree with @kaxil , other than sqlite
.
Personally I think sqlite
should come together with Airflow core by default, without explicit extra installation,
Considering two examples:
sqlite3
as one of its build-in standard libraries.Looks like ["http", "ftp", "sqlite", "imap"] is the winning set. They are all rather small and they increase the size of installation by likely less than 1%.
I am concerned that this is a good idea. I think it would be worthwhile for the user to pin a specific version so that they do not accidentally install a newer version that may contain regressions.
@mik-laj -> I do not think we have to move them to the "core". I can easily make those extras "enabled" by default as extras that are always used implicitly. This means that while they will be installed by default in their latest version even with pip install airflow
will also install those 4 providers. There will be no "constraints" for those - the user will have to explicitly upgrade them and will keep the possibility of downgrading them. I will update FAQs explaining this behavior.
One more comment: I also think it will be great to have a few providers installed from day zero. People might not fully realize that there are providers and they might be surprised to not see those other integrations installed but by seeing few providers pre-installed, this will be much more obvious. Simply 'pip freeze | grep apache-airflow` will show them how provider packages look like.
If there will be no more comments shortly, I will write this proposal to the devlist.
I do not think we have to move them to the "core".
@potiuk doesn't that mean that we keep them in core and make them available to all users, but they still have to refactor their DAGs (due to import changes)? Should we limit the number of changes required in users' DAGs?
@potiuk doesn't that mean that we keep them in core and make them available to all users, but they still have to refactor their DAGs (due to import changes)? Should we limit the number of changes required in users' DAGs?
I think moving them to core now is NOT a good idea, and I think most of the "core" operators were moved inside the core anyway - at least changed module names to conform to AIP-21. I do not think there is a big difference whether they moved inside the core, or whether they are moved to providers.
http_operator -> http
contrib.ftp_operator -> ftp
etc
Anyone know how pip would cope with circular dependencies? I.e. could apache-airflow
depend upon apache-airflow-provider-http
(which in turn depends upon apache-airflow
without giving pip a heart attack?
That we we can have "batteries included" but still keep the advantages of keeping smaller releases/easier updating of providers.
Edit: oh Jarek has a plan already. Cool
Anyone know how pip would cope with circular dependencies? I.e. could
apache-airflow
depend uponapache-airflow-provider-http
(which in turn depends uponapache-airflow
without giving pip a heart attack?That we we can have "batteries included" but still keep the advantages of keeping smaller releases/easier updating of providers.
Edit: oh Jarek has a plan already. Cool
Yep. This is already happening with all providers when we specify extras, PIP is cool with that :)
Description
When airflow 2.0 is installed from PyPI, providers are not installed by default. In order to install them, you should add an appropriate extra. While this behavior is identical in Airflow 1.10 for those "providers" that required additional packages, there were a few "providers" that did not require any extras to function (example http, ftp) - we have "http", "ftp" extras for them now, but maybe some of those are popular enough to be included by default?.
We have to make a decision now:
Use case / motivation
We want people to get a familiar experience when installing airflow. Why we provide familiar mechanism (with extras) and people will expect a slightly different configurations, installation and we can describe the differences, maybe some of those providers are so popular that we should include them by default?
Related Issues
12685 - where we discuss which of the extras should be included in the Production Image of 2.0.
Additional info
Here is the list of all "providers" that were present in 1.10 and had no additional dependencies - so basically they woudl work out-fhe-box in 1.10, but they need appropriate "extra" in 2.0.
Also here I appeal to the wisdom of crowd: @ashb, @dimberman @kaxil, @turbaszek, @mik-laj. @XD-DENG, @feluelle, @eladkal, @ryw, @vikramkoka, @KevinYang21 - let me know WDYT before I bring it to devlist?