Closed eladkal closed 2 days ago
Hello. Could you please exclude from this release ydb provider? It’s too young for that ;) it has known bug related to auth and I’d like to add new feature there later. Thank you in advance.
Hello. Could you please exclude from this release ydb provider? It’s too young for that ;) it has known bug related to auth and I’d like to add new feature there later. Thank you in advance.
In your PR you marked it as ready to be released https://github.com/apache/airflow/blob/6e5ae26382b328e88907e8301d4b2352ef8524c5/airflow/providers/ydb/provider.yaml#L24
Missing features are not a blocker for release. We can always add new features later on.
Hello. Could you please exclude from this release ydb provider? It’s too young for that ;) it has known bug related to auth and I’d like to add new feature there later. Thank you in advance.
In your PR you marked it as ready to be released
Missing features are not a blocker for release. We can always add new features later on.
To be honest yaml was just copied and bug was found two days ago;) auth info from file does not work, but there is workaround to provide auth info inplace.
Checked that all changes are there. The common.compat
does not have any code yet, so we could skip releasing it, but there is no harm in doing so - having a 1.0.0 version released in PyPI is generally a good idea.
Tested #40287, working fine. Thanks
tested #40290, working fine 👍🏻
tested #39955, working fine, thanks !
Confirmed https://github.com/apache/airflow/pull/40206 works as expected
I have tested #39991 and it works as expected.
[2024-06-23, 04:47:48 UTC] {beam.py:119} WARNING - {"message":"org.apache.beam.runners.dataflow.DataflowRunner - Dataflow SDK version: 2.56.0","severity":"INFO"}
[2024-06-23, 04:47:49 UTC] {beam.py:119} WARNING - {"message":"org.apache.beam.runners.dataflow.DataflowRunner - To access the Dataflow monitoring console, please navigate to https://console.cloud.google.com/dataflow/jobs/<redacted>","severity":"INFO"}
[2024-06-23, 04:47:49 UTC] {beam.py:119} WARNING - {"message":"org.apache.beam.runners.dataflow.DataflowRunner - Submitted job: <redacted>","severity":"INFO"}
[2024-06-23, 04:47:49 UTC] {beam.py:119} WARNING - {"message":"org.apache.beam.runners.dataflow.DataflowRunner - To cancel the job using the \u0027gcloud\u0027 tool, run:\n\u003e gcloud dataflow jobs --project\u003d<redacted> cancel --region\u003<redacted> <redacted>","severity":"INFO"}
[2024-06-23, 04:47:49 UTC] {beam.py:172} INFO - Process exited with return code: 0
[2024-06-23, 04:47:49 UTC] {dataflow.py:461} INFO - Start waiting for done.
[2024-06-23, 04:47:49 UTC] {dataflow.py:403} INFO - Google Cloud DataFlow job <redacted> is state: JOB_STATE_PENDING
[2024-06-23, 04:47:49 UTC] {dataflow.py:464} INFO - Waiting for done. Sleep 10 s
[2024-06-23, 04:47:59 UTC] {dataflow.py:403} INFO - Google Cloud DataFlow job <redacted> is state: JOB_STATE_PENDING
[2024-06-23, 04:47:59 UTC] {dataflow.py:464} INFO - Waiting for done. Sleep 10 s
[2024-06-23, 04:48:09 UTC] {dataflow.py:403} INFO - Google Cloud DataFlow job <redacted> is state: JOB_STATE_RUNNING
[2024-06-23, 04:48:09 UTC] {taskinstance.py:1401} INFO - Marking task as SUCCESS. dag_id=<redacted>, task_id=start_streaming, map_index=0, execution_date=20240623T044637, start_date=20240623T044717, end_date=20240623T044809
[2024-06-23, 04:48:09 UTC] {local_task_job_runner.py:228} INFO - Task exited with return code 0
[2024-06-23, 04:48:09 UTC] {taskinstance.py:2781} INFO - 0 downstream tasks scheduled from follow-on schedule check
Changes https://github.com/apache/airflow/pull/40253 works as expected.
Tested #40041 and it works as expected.
Tested #38497, works as expected.
Can you please remove teradata https://github.com/apache/airflow/pull/40378 PR from this 2.3.0 release as we wanted to release compute cluster functionality along with this. We are in process of creating new PR for compute cluster and we would like to release 2.3.0 with these two features as per our roadmap.
Can you please remove teradata #40378 PR from this 2.3.0 release as we wanted to release compute cluster functionality along with this. We are in process of creating new PR for compute cluster and we would like to release 2.3.0 with these two features as per our roadmap.
It's of course @eladkal (Release Manager's) decision - but no @satish-chinthanippu , this is not how our provider's release process work. We release every changed provider from main, and we only exclude a provider (not individual PRs) from release if a serious bug has been found. Manipulating and manually modifying stuff during the release is not easy, takes time and effort, might break various parts of the process (like documentation generation, or package preparation, publishing, signing and verification) and introduces serious overhead for ther release process that we don't want.
We have more than 90 providers and we cannot afford individual treatment and such "custom" approach.
If there is a bug / regression in the existing functionality that is a blocker - we might remove whole provider from the set of the providers being voted - but that's about all flexibility, and unless there is a bug in Teradata provider, it will be released as-is.
We do not look at other's roadmaps - this is a bit of price to pay for making the provider a "community managed" one and we've been very clear about the process when we accepted Teradata - one of the reasons why Teradata could choose to release their own provider, was that this could help them to manage their own release roadmap and schedule. Once this is a community provider, we do expect Teradata to keep it updated, with dashboard and fix bugs (in their own interest) but this also means that anyone can contribute changes (and Airflow committers make decisions what goes in and out) but also that we release it together with other providers with the same cadence.
I hope that explains how it works :). This is not a complaint or being nasty (we appreciate all the work done, system test dashboards, and all the new features you guys, add). It's just the way how we have to manage 90+ providers in release has to have some limitations and structure, and also the governance requirements of the ASF is very clear that once the code is submitted to our repo it has to follow the rules of ASF where only Airflow committers can decide on the code modifications. So I wanted to make it clear.
Hello, @potiuk . Thank you for clarification. I'd like to ask how we can improve quality of this release or it is not our goal? Usually release means something good and stable for use. But you tell that improving is big overhead. That means that release is something dirty in some situations, it is easy to release. I suggest to make it clear to users who installs such “release”. It could be just a comment that particular providers have some known problems or give authors some release branch to merge fixes there. Or even exclude provider from release.
Or even exclude provider from release.
Yes. This is how it works. Unit tests are the first line of defense and we assume that when we have passing unit test in main - the provider is ready to have release candidate out. And then this conversation here is foreseen to see if there are any bugs that should block certain providers and remove them from release. But this is NOT to block certain features from being released, it is only to see if there are no blocker bugs. What gets merged into main is assumed to be "ready for next release candidate". If you do not want a PR to be merged yet, you should keep on rebasing it and mark it as Draft until you feel it is ready to be released in the next release candidate (whenever it happens).
If there is a blocker bug /regression we exclude provider from release. But it's a 0/1 decision based on release manager's assesment whether it's ok to release particular provider or not, based on description of those who test the RC here. If a bug is found during RC, those who find it - should describe the scope and impact of the bug and release manager assesses and decides what to do. This is at sole discretion of the release manager (see https://www.apache.org/legal/release-policy.html#approving-a-release and releated documentation on the release process requirements by the Apache Software Foundation).
In our case we are ok to release new features even with minor non-blocking bugs, the "strong" reason for excluding the provider is when there is a major regression in already released features. Sometimes we even decide to release providers even if new features are not complete, if some partial implementation "work" but new work is planned (then it will be released in the next wave).
We do not "hold" releases, we release everything that has been merged to main. Full stop. This has been working like that for ~ 4 years for 90+ providers of ours.
BTW. And just to clarify - as per definition of the ASF release manager's job is purely mechanical (+ single-handedly make decision to exclude certain provides based on the assessment of bug description provided by those who test it).
See also here: https://infra.apache.org/release-publishing.html#releasemanager
The release manager releases whatever the community decided to merge as "ready to be released". In case of providers - "main" is the "ready for release" sign, so when you are marking your PR as "ready to be merged" and it passes all tests as an author you are saying "it's ready to be released".
That's why also in those "Status of the providers" issue we mark the authors, so that they can verify just before the release if there are no blocking bugs. See https://github.com/apache/airflow/issues/40382#issuecomment-2185793983 where @kacpermuda is still evaluating the impact for openlineage provider. But again - this is for bugs only. What goes into the next release is decided at merge time. See also https://github.com/apache/airflow/blob/main/PROVIDERS.rst#community-providers-release-process- where the release process and various aspects of it are explained.
So just to summarize it in short - release manager is NOT responsible for quality of the merged changes nor for the set of changes that are being prepared as release candidates. In both cases the authors are responsible - both for what goes in but also about the quality of what goes in. Release manager is a purely mechanical role to make the release happen, but the authors (with approval of committers who merge the changes) are driving both the scope and quality of the release. No-one else.. And the authors have a chance to verify their changes once the RC is out and have a chance to say "hey there is a blocker bug, I will fix it for the future release but for now let's remove the provider from the release".
I think this is a very, very clear split of responsibilities here and I am explaining it here, so that it's crystal clear as different people might have different assumptions on what is the release manager's and author's role in the process, and when tests are done.
Hello. Could you please exclude from this release ydb provider? It’s too young for that ;) it has known bug related to auth and I’d like to add new feature there later. Thank you in advance.
Let’s release ydb provider. Known issues are minors.
https://github.com/apache/airflow/pull/39348 looks good. thank you!
@eladkal please exclude openlineage provider from this wave, and if possible, let's go for rc2.
Can you please remove teradata #40378 PR from this 2.3.0 release as we wanted to release compute cluster functionality along with this. We are in process of creating new PR for compute cluster and we would like to release 2.3.0 with these two features as per our roadmap.
It's of course @eladkal (Release Manager's) decision - but no @satish-chinthanippu , this is not how our provider's release process work. We release every changed provider from main, and we only exclude a provider (not individual PRs) from release if a serious bug has been found. Manipulating and manually modifying stuff during the release is not easy, takes time and effort, might break various parts of the process (like documentation generation, or package preparation, publishing, signing and verification) and introduces serious overhead for ther release process that we don't want.
We have more than 90 providers and we cannot afford individual treatment and such "custom" approach.
If there is a bug / regression in the existing functionality that is a blocker - we might remove whole provider from the set of the providers being voted - but that's about all flexibility, and unless there is a bug in Teradata provider, it will be released as-is.
We do not look at other's roadmaps - this is a bit of price to pay for making the provider a "community managed" one and we've been very clear about the process when we accepted Teradata - one of the reasons why Teradata could choose to release their own provider, was that this could help them to manage their own release roadmap and schedule. Once this is a community provider, we do expect Teradata to keep it updated, with dashboard and fix bugs (in their own interest) but this also means that anyone can contribute changes (and Airflow committers make decisions what goes in and out) but also that we release it together with other providers with the same cadence.
I hope that explains how it works :). This is not a complaint or being nasty (we appreciate all the work done, system test dashboards, and all the new features you guys, add). It's just the way how we have to manage 90+ providers in release has to have some limitations and structure, and also the governance requirements of the ASF is very clear that once the code is submitted to our repo it has to follow the rules of ASF where only Airflow committers can decide on the code modifications. So I wanted to make it clear.
Thank you @potiuk for your detailed information. Understood the process and considerations regarding the release. Actually, @eladkal suggested to raise individual PRs for each related functionality implemented in Airflow Teradata Provider to make the review process simpler. So, in line with this suggestion, we thought of raising individual PRs for the two features we have planned for this release. We're committed to aligning with the community standards and appreciate the governance framework outlined by ASF. Given this understanding, we'll proceed with the new PR for the compute cluster functionality alongside Teradata's provider updates as per the standard release cadence. We'll ensure that our contributions meet the necessary criteria and are compatible with the overall release process. Please let us know if there are any specific guidelines or additional steps we should follow as we prepare these updates with multiple PRs for a single release.
https://github.com/apache/airflow/pull/40378 working as expected.
Tested #40013, #39771 and #39941. All works fine. Thank you for the release efforts!
Tested all of my work (#38868, #39154, #40048, #40086, #40162) and they all work as expected!
Thanks! 🥂
Actually, @eladkal suggested to raise individual PRs for each related functionality implemented in Airflow Teradata Provider to make the review process simpler.
Which we stand for. I do not understand the concern you raised. Merged PR = ready to release. What is the problem with releasing it as is?
Which we stand for. I do not understand the concern you raised. Merged PR = ready to release. What is the problem with releasing it as is?
Yep. This is all good and I stand for it too. Raising PR <> merging PR. If you wish PR to wait because there is a need to release it together with another - related - PR, the PR can be kept in draft or with an unresolved conversation explaining that it shoud not yet been merged, and other PR might be added on top (based on the first PR) and both could be rebased until both are ready to be released. I believe there was a lack of understanding that "merged" = "ready to release" for providers, so I hope it's now clear.
@potiuk and @eladkal understood. These steps clarifies on the process to follow to release related features with multiple PRs in a single release. Thank you for providing detailed information.
Actually, @eladkal suggested to raise individual PRs for each related functionality implemented in Airflow Teradata Provider to make the review process simpler.
Which we stand for. I do not understand the concern you raised. Merged PR = ready to release. What is the problem with releasing it as is?
Please release it. https://github.com/apache/airflow/pull/40378 Tested and working as expected.
Body
I have a kind request for all the contributors to the latest provider packages release. Could you please help us to test the RC versions of the providers?
The guidelines on how to test providers can be found in
Verify providers by contributors
Let us know in the comment, whether the issue is addressed.
Those are providers that require testing as there were some substantial changes introduced:
Provider amazon: 8.25.0rc1
importlib_metadata
import in aws utils (#40134): @eladkalRedshiftToS3Operator
(#40206): @jasonspeckimportlib.metadata
for retrievebotocore
package version (#40137): @Taragolis Linked issues:Provider apache.drill: 2.7.2rc1
Provider apache.kafka: 1.5.0rc1
delete_topic
toKafkaAdminClientHook
and teardown logic to Kafka integration tests (#40142): @shahar1 Linked issues:Provider cncf.kubernetes: 8.3.2rc1
Provider common.compat: 1.0.0rc1
Provider common.sql: 1.14.1rc1
BaseSQLOperator
and addsdatabase
as a templated field (#39826): @nyoungstudiosProvider databricks: 6.6.0rc1
list_jobs
(#40178): @stephenpurcell-dbProvider dbt.cloud: 3.9.0rc1
retry_from_failure
parameter to DbtCloudRunJobOperator (#38868): @boraberke Linked issues:DbtCloudRunJobOperator
to Use Correct Status Parameters forreuse_existing_run
(#40048): @boraberkeProvider docker: 3.12.1rc1
Provider fab: 1.2.0rc1
[webserver]update_fab_perms
to deprecated configs (#40317): @ephraimbuddyProvider ftp: 3.10.0rc1
Provider google: 10.20.0rc1
BigQueryUpdateTableSchemaOperator
(#40237): @shahar1 Linked issues:CloudDataTransferServiceRunJobOperator
(#39154): @boraberkeGCSToGCSOperator
behavior difference for moving single object (#40162): @boraberke Linked issues:Provider http: 4.12.0rc1
retry_args
parameter toHttpOperator
(#40086): @boraberke Linked issues:Provider microsoft.azure: 10.1.2rc1
Provider microsoft.mssql: 3.7.2rc1
mssql
integration tests and relocate existing unit tests (#39831): @shahar1 Linked issues:Provider openai: 1.2.2rc1
Provider openlineage: 1.9.0rc1
Provider opensearch: 1.3.0rc1
Provider sftp: 4.10.2rc1
Provider snowflake: 5.5.2rc1
Provider telegram: 4.5.2rc1
Provider teradata: 2.3.0rc1
Provider ydb: 1.0.0rc1
All users involved in the PRs: @rahul-madaan @eladkal @Taragolis @nyoungstudios @uzhastik @jalengg @mobuchowski @e-galan @VladaZakharova @riccardoforzan @josh-fell @pankajastro @kacpermuda @ephraimbuddy @satish-chinthanippu @boraberk
Committer