Closed renovate[bot] closed 4 weeks ago
Name | Link |
---|---|
Latest commit | 17d596b6c640dda5931ce1847762099ed410fee1 |
Latest deploy log | https://app.netlify.com/sites/peppy-sprite-186812/deploys/671ab4a4b8f1d200086d8648 |
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 81.16%. Comparing base (
42d4081
) to head (17d596b
). Report is 1 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
This PR contains the following updates:
1.13.1
->1.23.0
Release Notes
OpenLineage/OpenLineage (io.openlineage:openlineage-java)
### [`v1.23.0`](https://redirect.github.com/OpenLineage/OpenLineage/blob/HEAD/CHANGELOG.md#1230---2024-10-04) [Compare Source](https://redirect.github.com/OpenLineage/OpenLineage/compare/1.22.0...1.23.0) ##### Added - **Java: added CompositeTransport** [`#3039`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3039) [@JDarDagran](https://redirect.github.com/JDarDagran)\ *This allows user to specify multiple targets to which OpenLineage events will be emitted.* - **Spark extension interfaces: support table extended sources** [`#3062`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3062) [@Imbruced](https://redirect.github.com/Imbruced)\ *Interfaces are now able to extract lineage from Table interface, not only RelationProvider.* - **Java: added GCP Dataplex transport** [`#3043`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3043) [@ddebowczyk92](https://redirect.github.com/ddebowczyk92)\ *Dataplex transport is now available as a separate Maven package for users that want to send OL events to GCP Dataplex* - **Java: added Google Cloud Storage transport** [`#3077`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3077) [@ddebowczyk92](https://redirect.github.com/ddebowczyk92)\ *GCS transport is now available as a separate Maven package for users that want to send OL events to Google Cloud Storage* - **Java: added S3 transport** [`#3129`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3129) [@arturowczarek](https://redirect.github.com/arturowczarek)\ *S3 transport is now available as a separate Maven package for users that want to send OL events to S3* - \*\*Java: add option to configure client via environment variables \*\* [`#3094`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3094) [@JDarDagran](https://redirect.github.com/JDarDagran)\ *Specified variables are now autotranslated to configuration values.* - \*\*Python: add option to configure client via environment variables \*\* [`#3114`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3114) [@JDarDagran](https://redirect.github.com/JDarDagran)\ *Specified variables are now autotranslated to configuration values.* - \*\*Python: add option to add custom headers in HTTP transport \*\* [`#3116`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3116) [@JDarDagran](https://redirect.github.com/JDarDagran)\ *Allows user to add custom headers, for example for auth purposes.* - \*\*Column level lineage: add full dataset dependencies \*\* [`#3097`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3097) [`#3098`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3098) [@arturowczarek](https://redirect.github.com/arturowczarek)\ *Now, if datasetLineageEnabled is enabled, and when column level lineage depends on the whole dataset, it does add dataset dependency instead of listing all the column fields in that dataset.* - **Java: OpenLineageClient and Transports are now AutoCloseable** [`#3122`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3122) [@ddebowczyk92](https://redirect.github.com/ddebowczyk92)\ *This prevents a number of issues that might be caused by not closing underlying transports* ##### Fixed - **Python Facet generator does not validate optional arguments** [`#3054`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3054) [@JDarDagran](https://redirect.github.com/JDarDagran)\ *This fixes issue where NominalTimeRunFacet Facet breaks when nominalEndTime is None* - **SQL: report only actually used tables from CTEs, rather than all** [`#2962`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2962) [@Imbruced](https://redirect.github.com/Imbruced)\ *With this change, if SQL specified CTE, but does not use it in final query, the lineage won't be falsely reported* - **Fluentd: Enhancing plugin's capabilities** [`#3068`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3068) [@jonathanlbt1](https://redirect.github.com/jonathanlbt1)\ *This change enhances performance and docs of fluentd proxy plugin.* - **SQL: fix parser to point to origin table instead of CTEs** [`#3107`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3107) [@Imbruced](https://redirect.github.com/Imbruced)\ *For some complex CTEs, parser emitted CTE as a target table instead of original table. This is now fixed.* - **Spark: column lineage correctly produces for merge into command** [`#3095`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3095) [@Imbruced](https://redirect.github.com/Imbruced)\ *Now OL produces CLL correctly for the potential view in the middle.* ### [`v1.22.0`](https://redirect.github.com/OpenLineage/OpenLineage/blob/HEAD/CHANGELOG.md#1220---2024-09-05) [Compare Source](https://redirect.github.com/OpenLineage/OpenLineage/compare/1.21.1...1.22.0) ##### Added - **SQL: add support for `USE` statement with different syntaxes** [`#2944`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2944) [@kacpermuda](https://redirect.github.com/kacpermuda)\ *Adjusts our Context so that it can use the new support for this statement in the parser and pass it to a number of queries.* - **Spark: add script to build Spark dependencies** [`#3044`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3044) [@arturowczarek](https://redirect.github.com/arturowczarek)\ *Adds a script to rebuild dependencies automatically following releases.* - **Website: versionable docs** [`#3007`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3007) [`#3023`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3023) [@pawel-big-lebowski](https://redirect.github.com/pawel-big-lebowski)\ *Adds a GitHub action that creates a new Docusaurus version on a tag push, verifiable using the openlineage-site repo. Implements a monorepo approach in a new `website` directory.* ##### Fixed - **SQL: add support for `SingleQuotedString` in `Identifier()`** [`#3035`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3035) [@kacpermuda](https://redirect.github.com/kacpermuda)\ *Single quoted strings were being treated differently than strings with no quotes, double quotes, or backticks.* - **SQL: support `IDENTIFIER` function instead of treating it like table name** [`#2999`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2999) [@kacpermuda](https://redirect.github.com/kacpermuda)\ *Adds support for this identifier in SELECT, MERGE, UPDATE, and DELETE statements. For now, only static identifiers are supported. When a variable is used, this table is removed from lineage to avoid emitting incorrect lineage.* - **Spark: fix issue with only one table in inputs from SQL query while reading from JDBC** [`#2918`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2918) [@Imbruced](https://redirect.github.com/Imbruced)\ *Events created did not contain the correct input table when the query contained multiple tables.* - **Spark: fix AWS Glue jobs naming for RDD events** [`#3020`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3020) [@arturowczarek](https://redirect.github.com/arturowczarek)\ *The naming for RDD jobs now uses the same code as SQL and Application events.* ### [`v1.21.1`](https://redirect.github.com/OpenLineage/OpenLineage/blob/HEAD/CHANGELOG.md#1211---2024-08-29) [Compare Source](https://redirect.github.com/OpenLineage/OpenLineage/compare/1.20.5...1.21.1) ##### Added - **Spec: add GCP Dataproc facet** [`#2987`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2987) [@tnazarew](https://redirect.github.com/tnazarew)\ *Registers the Google Cloud Platform Dataproc run facet.* ##### Fixed - **Airflow: update SQL integration code to work with latest sqlparser-rs main** [`#2983`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2983) [@kacpermuda](https://redirect.github.com/kacpermuda)\ *Adjusts the SQL integration after our sqlparser-rs fork has been updated to the latest main.* - **Spark: fix AWS Glue jobs naming for SQL events** [`#3001`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3001) [@arturowczarek](https://redirect.github.com/arturowczarek)\ *SQL events now properly use the names of the jobs retrieved from AWS Glue.* - **Spark: fix issue with column lineage when using delta merge into command** [`#2986`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2986) [@Imbruced](https://redirect.github.com/Imbruced)\ *A view instance of a node is now included when gathering data sources for input columns.* - **Spark: minor Spark filters refactor** [`#2990`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2990) [@arturowczarek](https://redirect.github.com/arturowczarek)\ *Fixes a number of minor issues.* - **Spark: Iceberg tables in AWS Glue have slashes instead of dots in symlinks** [`#2984`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2984) [@arturowczarek](https://redirect.github.com/arturowczarek)\ *They should use slashes and the prefix `table/`.* - **Spark: lineage for Iceberg datasets that are present outside of Spark's catalog is now present** [`#2937`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2937) [@d-m-h](https://redirect.github.com/d-m-h) *Previously, reading Iceberg datasets outside the configured Spark catalog prevented the datasets from being present in the `inputs` property of the `RunEvent`.* ### [`v1.20.5`](https://redirect.github.com/OpenLineage/OpenLineage/blob/HEAD/CHANGELOG.md#1205---2024-08-23) [Compare Source](https://redirect.github.com/OpenLineage/OpenLineage/compare/1.20.3...1.20.5) ##### Added - **Python: add `CompositeTransport`** [`#2925`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2925) [@JDarDagran](https://redirect.github.com/JDarDagran)\ *Adds a `CompositeTransport` that can accept other transport configs to instantiate transports and use them to emit events.* - **Spark: compile & test Spark integration on Java 17** [`#2828`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2828) [@pawel-big-lebowski](https://redirect.github.com/pawel-big-lebowski)\ *The Spark integration is always compiled with Java 17, while tests are running on both Java 8 and Java 17 according to the configuration.* - **Spark: support preview release of Spark 4.0** [`#2854`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2854) [@pawel-big-lebowski](https://redirect.github.com/pawel-big-lebowski)\ *Includes the Spark 4.0 preview release in the integration tests.* - **Spark: add handling for `Window`** [`#2901`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2901) [@tnazarew](https://redirect.github.com/tnazarew)\ *Adds handling for `Window`-type nodes of a logical plan.* - **Spark: extract and send events with raw SQL from Spark** [`#2913`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2913) [@Imbruced](https://redirect.github.com/Imbruced)\ *Adds a parser that traverses `QueryExecution` to get the SQL query used from the SQL field with a BFS algorithm.* - **Spark: support Mongostream source** [`#2887`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2887) [@Imbruced](https://redirect.github.com/Imbruced)\ *Adds a Mongo streaming visitor and tests.* - **Spark: new mechanism for disabling facets** [`#2912`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2912) [@arturowczarek](https://redirect.github.com/arturowczarek)\ *The mechanism makes `FacetConfig` accept the disabled flag for any facet instead of passing them as a list.* - **Spark: support Kinesis source** [`#2906`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2906) [@Imbruced](https://redirect.github.com/Imbruced)\ *Adds a Kinesis class handler in the streaming source builder.* - **Spark: extract `DatasetIdentifier` from extension `LineageNode`** [`#2900`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2900) [@ddebowczyk92](https://redirect.github.com/ddebowczyk92)\ *Adds support for cases in which `LogicalRelation` has a grandChild node that implements the `LineageRelation` interface.* - **Spark: extract Dataset from underlying `BaseRelation`** [`#2893`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2893) [@ddebowczyk92](https://redirect.github.com/ddebowczyk92)\ *`DatasetIdentifier` is now extracted from the underlying node of `LogicalRelation`.* - **Spark: add descriptions and Marquez UI to Docker Compose file** [`#2889`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2889) [@jonathanlbt1](https://redirect.github.com/jonathanlbt1)\ *Adds the `marquez-web` service to docker-compose.yml.* ##### Fixed - **Proxy: bug fixed on error messages descriptions** [`#2880`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2880) [@jonathanlbt1](https://redirect.github.com/jonathanlbt1)\ *Improves error logging.* - **Proxy: update Docker image for Fluentd 1.17** [`#2877`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2877) [@jonathanlbt1](https://redirect.github.com/jonathanlbt1)\ *Upgrades the Fluentd version.* - **Spark: fix issue with Kafka source when saving with `for each` batch method** [`#2868`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2868) [@imbruced](https://redirect.github.com/Imbruced)\ *Fixes an issue when Spark is in streaming mode and input for Kafka was not present in the event.* - **Spark: properly set ARN in namespace for Iceberg Glue symlinks** [`#2943`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2943) [@arturowczarek](https://redirect.github.com/arturowczarek)\ *Makes `IcebergHandler` support Glue catalog tables and create the symlink using the code from `PathUtils`.* - **Spark: accept any provider for AWS Glue storage format** [`#2917`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2917) [@arturowczarek](https://redirect.github.com/arturowczarek)\ *Makes the AWS Glue ARN generating method accept every format (including Parquet), not only Hive SerDe.* - **Spark: return valid JSON for failed logical plan serialization** [`#2892`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2892) [@arturowczarek](https://redirect.github.com/arturowczarek)\ *The `LogicalPlanSerializer` now returns `Configuration
π Schedule: Branch creation - "every 3 months on the first day of the month" (UTC), Automerge - At any time (no schedule defined).
π¦ Automerge: Disabled by config. Please merge this manually once you are satisfied.
β» Rebasing: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.
π Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.