apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.4k stars 14.11k forks source link

Replace `--package-filter` usage for docs breeze command with short package names #33876

Closed potiuk closed 1 year ago

potiuk commented 1 year ago

Body

The --package-filter while nice in theory to specify which packages to build, has quite bad UX (lots of repetitions when specifying multiple packages, long package names. We practically (except --package-filter apache-airflow-providers-* never use the functionality of the filter with glob patterns.

It's much more practical to use "short" package names ("apache.hdfs" rather that --package-filter apache-airflow-providers-apache-hdfs and we already use it in a few places in Breeze.

We should likely replace all the places when we use --package-filter with those short names, add a special alias for all-providers and this should help our users who build documentation and release manager to do their work faster and nicer.

This would also allow to remove the separate ./dev/provider_packages/publish_provider_documentation.sh bash script that is aimed to do somethign similar in a "hacky way".

Committer

potiuk commented 1 year ago

cc: @amoghrajesh -> maybe another task for you :) ?

amoghrajesh commented 1 year ago

Nice nice. Can you assign this to me?

I'd love to approach this!

eladkal commented 1 year ago

I would even prefer listing providers as: microsoft.azure cncf.kubernetes rather than apache-airflow-providers-microsoft-azure apache-airflow-providers-cncf-kubernetes

mainly becausebreeze release-management prepare-provider-documentation gives them as summary

   Success:
alibaba amazon apache.beam apache.flink apache.hive apache.livy apache.pinot apache.spark arangodb 
celery cncf.kubernetes common.sql daskexecutor databricks datadog dbt.cloud docker elasticsearch 
exasol ftp github google grpc http imap influxdb microsoft.azure microsoft.psrp microsoft.winrm 
mysql openlineage oracle plexus presto salesforce sendgrid sftp singularity slack smtp snowflake 
ssh trino yandex
   Skipped:
apache.impala jdbc microsoft.mssql neo4j postgres qubole sqlite vertica
   Marked as doc-only (please commit those!):
airbyte apache.cassandra apache.drill apache.druid apache.hdfs apache.kafka apache.kylin apache.pig 
apache.sqoop apprise asana atlassian.jira cloudant dingding discord facebook hashicorp jenkins 
mongo odbc openfaas opsgenie pagerduty papermill redis samba segment tableau tabular telegram 
zendesk
potiuk commented 1 year ago

I would even prefer listing providers as: microsoft.azure cncf.kubernetes rather than apache-airflow-providers-microsoft-azure apache-airflow-providers-cncf-kubernetes

Absolutely - this is what we do in a few other places already and there is even a dedicated argument we have:

argument_packages = click.argument(
    "packages",
    nargs=-1,
    required=False,
    type=BetterChoice(get_available_documentation_packages(short_version=True)),
)

The short_version is doing this

potiuk commented 1 year ago

BTW. The way I'd do it @amoghrajesh - I'd still keep the --package-filter option as is but then I'd also add the argument above - and replace the suggestion, documentation etc. to use the argument instead of --package-filter for regular tasks (there are a few places where we print help and suggestion about --package-filter.