MarquezProject / marquez

Collect, aggregate, and visualize a data ecosystem's metadata
https://marquezproject.ai
Apache License 2.0
1.78k stars 320 forks source link

fix(deps): update dependency io.openlineage:openlineage-java to v1.23.0 #2907

Closed renovate[bot] closed 4 weeks ago

renovate[bot] commented 1 month ago

This PR contains the following updates:

Package Change Age Adoption Passing Confidence
io.openlineage:openlineage-java 1.13.1 -> 1.23.0 age adoption passing confidence

Release Notes

OpenLineage/OpenLineage (io.openlineage:openlineage-java) ### [`v1.23.0`](https://redirect.github.com/OpenLineage/OpenLineage/blob/HEAD/CHANGELOG.md#1230---2024-10-04) [Compare Source](https://redirect.github.com/OpenLineage/OpenLineage/compare/1.22.0...1.23.0) ##### Added - **Java: added CompositeTransport** [`#3039`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3039) [@​JDarDagran](https://redirect.github.com/JDarDagran)\ *This allows user to specify multiple targets to which OpenLineage events will be emitted.* - **Spark extension interfaces: support table extended sources** [`#3062`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3062) [@​Imbruced](https://redirect.github.com/Imbruced)\ *Interfaces are now able to extract lineage from Table interface, not only RelationProvider.* - **Java: added GCP Dataplex transport** [`#3043`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3043) [@​ddebowczyk92](https://redirect.github.com/ddebowczyk92)\ *Dataplex transport is now available as a separate Maven package for users that want to send OL events to GCP Dataplex* - **Java: added Google Cloud Storage transport** [`#3077`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3077) [@​ddebowczyk92](https://redirect.github.com/ddebowczyk92)\ *GCS transport is now available as a separate Maven package for users that want to send OL events to Google Cloud Storage* - **Java: added S3 transport** [`#3129`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3129) [@​arturowczarek](https://redirect.github.com/arturowczarek)\ *S3 transport is now available as a separate Maven package for users that want to send OL events to S3* - \*\*Java: add option to configure client via environment variables \*\* [`#3094`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3094) [@​JDarDagran](https://redirect.github.com/JDarDagran)\ *Specified variables are now autotranslated to configuration values.* - \*\*Python: add option to configure client via environment variables \*\* [`#3114`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3114) [@​JDarDagran](https://redirect.github.com/JDarDagran)\ *Specified variables are now autotranslated to configuration values.* - \*\*Python: add option to add custom headers in HTTP transport \*\* [`#3116`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3116) [@​JDarDagran](https://redirect.github.com/JDarDagran)\ *Allows user to add custom headers, for example for auth purposes.* - \*\*Column level lineage: add full dataset dependencies \*\* [`#3097`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3097) [`#3098`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3098) [@​arturowczarek](https://redirect.github.com/arturowczarek)\ *Now, if datasetLineageEnabled is enabled, and when column level lineage depends on the whole dataset, it does add dataset dependency instead of listing all the column fields in that dataset.* - **Java: OpenLineageClient and Transports are now AutoCloseable** [`#3122`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3122) [@​ddebowczyk92](https://redirect.github.com/ddebowczyk92)\ *This prevents a number of issues that might be caused by not closing underlying transports* ##### Fixed - **Python Facet generator does not validate optional arguments** [`#3054`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3054) [@​JDarDagran](https://redirect.github.com/JDarDagran)\ *This fixes issue where NominalTimeRunFacet Facet breaks when nominalEndTime is None* - **SQL: report only actually used tables from CTEs, rather than all** [`#2962`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2962) [@​Imbruced](https://redirect.github.com/Imbruced)\ *With this change, if SQL specified CTE, but does not use it in final query, the lineage won't be falsely reported* - **Fluentd: Enhancing plugin's capabilities** [`#3068`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3068) [@​jonathanlbt1](https://redirect.github.com/jonathanlbt1)\ *This change enhances performance and docs of fluentd proxy plugin.* - **SQL: fix parser to point to origin table instead of CTEs** [`#3107`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3107) [@​Imbruced](https://redirect.github.com/Imbruced)\ *For some complex CTEs, parser emitted CTE as a target table instead of original table. This is now fixed.* - **Spark: column lineage correctly produces for merge into command** [`#3095`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3095) [@​Imbruced](https://redirect.github.com/Imbruced)\ *Now OL produces CLL correctly for the potential view in the middle.* ### [`v1.22.0`](https://redirect.github.com/OpenLineage/OpenLineage/blob/HEAD/CHANGELOG.md#1220---2024-09-05) [Compare Source](https://redirect.github.com/OpenLineage/OpenLineage/compare/1.21.1...1.22.0) ##### Added - **SQL: add support for `USE` statement with different syntaxes** [`#2944`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2944) [@​kacpermuda](https://redirect.github.com/kacpermuda)\ *Adjusts our Context so that it can use the new support for this statement in the parser and pass it to a number of queries.* - **Spark: add script to build Spark dependencies** [`#3044`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3044) [@​arturowczarek](https://redirect.github.com/arturowczarek)\ *Adds a script to rebuild dependencies automatically following releases.* - **Website: versionable docs** [`#3007`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3007) [`#3023`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3023) [@​pawel-big-lebowski](https://redirect.github.com/pawel-big-lebowski)\ *Adds a GitHub action that creates a new Docusaurus version on a tag push, verifiable using the openlineage-site repo. Implements a monorepo approach in a new `website` directory.* ##### Fixed - **SQL: add support for `SingleQuotedString` in `Identifier()`** [`#3035`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3035) [@​kacpermuda](https://redirect.github.com/kacpermuda)\ *Single quoted strings were being treated differently than strings with no quotes, double quotes, or backticks.* - **SQL: support `IDENTIFIER` function instead of treating it like table name** [`#2999`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2999) [@​kacpermuda](https://redirect.github.com/kacpermuda)\ *Adds support for this identifier in SELECT, MERGE, UPDATE, and DELETE statements. For now, only static identifiers are supported. When a variable is used, this table is removed from lineage to avoid emitting incorrect lineage.* - **Spark: fix issue with only one table in inputs from SQL query while reading from JDBC** [`#2918`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2918) [@​Imbruced](https://redirect.github.com/Imbruced)\ *Events created did not contain the correct input table when the query contained multiple tables.* - **Spark: fix AWS Glue jobs naming for RDD events** [`#3020`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3020) [@​arturowczarek](https://redirect.github.com/arturowczarek)\ *The naming for RDD jobs now uses the same code as SQL and Application events.* ### [`v1.21.1`](https://redirect.github.com/OpenLineage/OpenLineage/blob/HEAD/CHANGELOG.md#1211---2024-08-29) [Compare Source](https://redirect.github.com/OpenLineage/OpenLineage/compare/1.20.5...1.21.1) ##### Added - **Spec: add GCP Dataproc facet** [`#2987`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2987) [@​tnazarew](https://redirect.github.com/tnazarew)\ *Registers the Google Cloud Platform Dataproc run facet.* ##### Fixed - **Airflow: update SQL integration code to work with latest sqlparser-rs main** [`#2983`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2983) [@​kacpermuda](https://redirect.github.com/kacpermuda)\ *Adjusts the SQL integration after our sqlparser-rs fork has been updated to the latest main.* - **Spark: fix AWS Glue jobs naming for SQL events** [`#3001`](https://redirect.github.com/OpenLineage/OpenLineage/pull/3001) [@​arturowczarek](https://redirect.github.com/arturowczarek)\ *SQL events now properly use the names of the jobs retrieved from AWS Glue.* - **Spark: fix issue with column lineage when using delta merge into command** [`#2986`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2986) [@​Imbruced](https://redirect.github.com/Imbruced)\ *A view instance of a node is now included when gathering data sources for input columns.* - **Spark: minor Spark filters refactor** [`#2990`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2990) [@​arturowczarek](https://redirect.github.com/arturowczarek)\ *Fixes a number of minor issues.* - **Spark: Iceberg tables in AWS Glue have slashes instead of dots in symlinks** [`#2984`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2984) [@​arturowczarek](https://redirect.github.com/arturowczarek)\ *They should use slashes and the prefix `table/`.* - **Spark: lineage for Iceberg datasets that are present outside of Spark's catalog is now present** [`#2937`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2937) [@​d-m-h](https://redirect.github.com/d-m-h) *Previously, reading Iceberg datasets outside the configured Spark catalog prevented the datasets from being present in the `inputs` property of the `RunEvent`.* ### [`v1.20.5`](https://redirect.github.com/OpenLineage/OpenLineage/blob/HEAD/CHANGELOG.md#1205---2024-08-23) [Compare Source](https://redirect.github.com/OpenLineage/OpenLineage/compare/1.20.3...1.20.5) ##### Added - **Python: add `CompositeTransport`** [`#2925`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2925) [@​JDarDagran](https://redirect.github.com/JDarDagran)\ *Adds a `CompositeTransport` that can accept other transport configs to instantiate transports and use them to emit events.* - **Spark: compile & test Spark integration on Java 17** [`#2828`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2828) [@​pawel-big-lebowski](https://redirect.github.com/pawel-big-lebowski)\ *The Spark integration is always compiled with Java 17, while tests are running on both Java 8 and Java 17 according to the configuration.* - **Spark: support preview release of Spark 4.0** [`#2854`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2854) [@​pawel-big-lebowski](https://redirect.github.com/pawel-big-lebowski)\ *Includes the Spark 4.0 preview release in the integration tests.* - **Spark: add handling for `Window`** [`#2901`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2901) [@​tnazarew](https://redirect.github.com/tnazarew)\ *Adds handling for `Window`-type nodes of a logical plan.* - **Spark: extract and send events with raw SQL from Spark** [`#2913`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2913) [@​Imbruced](https://redirect.github.com/Imbruced)\ *Adds a parser that traverses `QueryExecution` to get the SQL query used from the SQL field with a BFS algorithm.* - **Spark: support Mongostream source** [`#2887`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2887) [@​Imbruced](https://redirect.github.com/Imbruced)\ *Adds a Mongo streaming visitor and tests.* - **Spark: new mechanism for disabling facets** [`#2912`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2912) [@​arturowczarek](https://redirect.github.com/arturowczarek)\ *The mechanism makes `FacetConfig` accept the disabled flag for any facet instead of passing them as a list.* - **Spark: support Kinesis source** [`#2906`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2906) [@​Imbruced](https://redirect.github.com/Imbruced)\ *Adds a Kinesis class handler in the streaming source builder.* - **Spark: extract `DatasetIdentifier` from extension `LineageNode`** [`#2900`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2900) [@​ddebowczyk92](https://redirect.github.com/ddebowczyk92)\ *Adds support for cases in which `LogicalRelation` has a grandChild node that implements the `LineageRelation` interface.* - **Spark: extract Dataset from underlying `BaseRelation`** [`#2893`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2893) [@​ddebowczyk92](https://redirect.github.com/ddebowczyk92)\ *`DatasetIdentifier` is now extracted from the underlying node of `LogicalRelation`.* - **Spark: add descriptions and Marquez UI to Docker Compose file** [`#2889`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2889) [@​jonathanlbt1](https://redirect.github.com/jonathanlbt1)\ *Adds the `marquez-web` service to docker-compose.yml.* ##### Fixed - **Proxy: bug fixed on error messages descriptions** [`#2880`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2880) [@​jonathanlbt1](https://redirect.github.com/jonathanlbt1)\ *Improves error logging.* - **Proxy: update Docker image for Fluentd 1.17** [`#2877`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2877) [@​jonathanlbt1](https://redirect.github.com/jonathanlbt1)\ *Upgrades the Fluentd version.* - **Spark: fix issue with Kafka source when saving with `for each` batch method** [`#2868`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2868) [@​imbruced](https://redirect.github.com/Imbruced)\ *Fixes an issue when Spark is in streaming mode and input for Kafka was not present in the event.* - **Spark: properly set ARN in namespace for Iceberg Glue symlinks** [`#2943`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2943) [@​arturowczarek](https://redirect.github.com/arturowczarek)\ *Makes `IcebergHandler` support Glue catalog tables and create the symlink using the code from `PathUtils`.* - **Spark: accept any provider for AWS Glue storage format** [`#2917`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2917) [@​arturowczarek](https://redirect.github.com/arturowczarek)\ *Makes the AWS Glue ARN generating method accept every format (including Parquet), not only Hive SerDe.* - **Spark: return valid JSON for failed logical plan serialization** [`#2892`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2892) [@​arturowczarek](https://redirect.github.com/arturowczarek)\ *The `LogicalPlanSerializer` now returns `` for failed serialization instead of an empty string.* - **Spark: extract legacy column lineage visitors loader** [`#2883`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2883) [@​arturowczarek](https://redirect.github.com/arturowczarek)\ *Refactors `CustomCollectorsUtils` for improved readability.* - **Spark: add Kafka input source when writing in `foreach` batch mode** [`#2868`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2868) [@​Imbruced](https://redirect.github.com/Imbruced)\ *Fixes a bug keeping Kafka input sources from being produced.* - **Spark: extract `DatasetIdentifier` from `SaveIntoDataSourceCommandVisitor` options** [`#2934`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2934) [@​ddebowczyk92](https://redirect.github.com/ddebowczyk92)\ *Extracts `DatasetIdentifier` from command's options instead of relying on `p.createRelation(sqlContext, command.options())`, which is a heavy operation for `JdbcRelationProvider`.* ### [`v1.20.3`](https://redirect.github.com/OpenLineage/OpenLineage/compare/1.19.0...1.20.3) [Compare Source](https://redirect.github.com/OpenLineage/OpenLineage/compare/1.19.0...1.20.3) ### [`v1.19.0`](https://redirect.github.com/OpenLineage/OpenLineage/blob/HEAD/CHANGELOG.md#1190---2024-07-22) [Compare Source](https://redirect.github.com/OpenLineage/OpenLineage/compare/1.18.0...1.19.0) ##### Added - **Airflow: add `log_url` to `AirflowRunFacet`** [`#2852`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2852) [@​dolfinus](https://redirect.github.com/dolfinus)\ *Adds taskinstance's `log_url` field to `AirflowRunFacet`* - **Spark: add handling for `Generate`** [`#2856`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2856) [@​tnazarew](https://redirect.github.com/tnazarew)\ *Adds handling for `Generate`-type nodes of a logical plan (e.g., explode operations).* - **Java: add `DerbyJdbcExtractor`** [`#2869`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2869) [@​dolfinus](https://redirect.github.com/dolfinus)\ *Adds `JdbcExtractor` implementation for Derby database. As this is a file-based DBMS, its Dataset namespace is `file` and name is an absolute path to a database file.* - **Spark: verify bytecode version of the built jar** [`#2859`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2859) [@​pawel-big-lebowski](https://redirect.github.com/pawel-big-lebowski)\ *Extends the `JarVerifier` plugin to ensure all compiled classes have a bytecode version of Java 8 or lower.* - **Spark: add Kafka streaming source support** [`#2851`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2851) [@​d-m-h](https://redirect.github.com/d-m-h) [@​imbruced](https://redirect.github.com/Imbruced)\ *Adds support for Kafka streaming sources to Kafka streaming sinks. Inputs and outputs are now included in lineage events.* ##### Fixed - **Airflow: replace datetime.now with airflow.utils.timezone.utcnow** [`#2865`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2865) [@​kacpermuda](https://redirect.github.com/kacpermuda)\ *Fixes missing timezone information in task FAIL events* - **Spark: remove shaded dependency in `ColumnLevelLineageBuilder`** [`#2850`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2850) [@​tnazarew](https://redirect.github.com/tnazarew)\ *Removes the shaded `Streams` dependency in `ColumnLevelLineageBuilder` causing a `ClassNotFoundException`.* - **Spark: make Delta dataset symlink consistent with non-Delta tables** [`#2863`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2863) [@​dolfinus](https://redirect.github.com/dolfinus)\ *Makes dataset symlinks for Delta and non-Delta tables consistent.* - **Spark: use Table's properties during column-level lineage construction** [`#2855`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2855) [@​ddebowczyk92](https://redirect.github.com/ddebowczyk92)\ *Fixes `PlanUtils3` so Dataset identifier information based on a Table's properties is also retrieved during the construction of column-level lineage.* - **Spark: extract job name creation to providers** [`#2861`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2861) [@​arturowczarek](https://redirect.github.com/arturowczarek)\ *The integration now detects if the `spark.app.name` was autogenerated by Glue and uses the Glue job name in such cases. Also, each job name provisioning strategy is now extracted to a separate provider.* ### [`v1.18.0`](https://redirect.github.com/OpenLineage/OpenLineage/blob/HEAD/CHANGELOG.md#1180---2024-07-11) [Compare Source](https://redirect.github.com/OpenLineage/OpenLineage/compare/1.17.1...1.18.0) ##### Added - **Spark: configurable integration test** [`#2755`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2755) [@​pawel-big-lebowski](https://redirect.github.com/pawel-big-lebowski)\ *Provides command line tool capable of running Spark integration tests that can be created without Java.* - **Spark: OpenLineage Spark extension interfaces without runtime dependency hell** [`#2809`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2809) [`#2837`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2837) [@​ddebowczyk92](https://redirect.github.com/ddebowczyk92)\ *New Spark extension interfaces without runtime dependency hell. Includes a test to verify the integration is working properly.* - **Spark: support latest versions 3.4.3 and 3.5.1.** [`#2743`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2743) [@​pawel-big-lebowski](https://redirect.github.com/pawel-big-lebowski)\ *Upgrades CI workflows to run tests against latest Spark versions: 3.4.2 -> 3.4.3 and 3.5.0 -> 3.5.1.* - **Spark: add extraction of the masking property in column-level lineage** [`#2789`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2789) [@​tnazarew](https://redirect.github.com/tnazarew)\ *Adds extraction of the masking property during collection of dependencies for `ColumnLineageDatasetFacet` creation.* - **Spark: collect table name from `InsertIntoHadoopFsRelationCommand`** [`#2794`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2794) [@​dolfinus](https://redirect.github.com/dolfinus)\ *Collects a table name for `INSERT INTO` command for tables created with `USING $fileFormat` syntax, like `USING orc`.* - **Spark, Flink: add `PostgresJdbcExtractor`** [`#2806`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2806) [@​dolfinus](https://redirect.github.com/dolfinus)\ *Adds the default `5432` port to Postgres namespaces.* - **Spark, Flink: add `TeradataJdbcExtractor`** [`#2826`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2826) [@​dolfinus](https://redirect.github.com/dolfinus)\ *Converts JDBC URLs like `jdbc:teradata/host/DBS_PORT=1024,DATABASE=somedb` to datasets with namespace `teradata://host:1024` and name `somedb.table`.* - **Spark, Flink: add `MySqlJdbcExtractor`** [`#2825`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2825) [@​dolfinus](https://redirect.github.com/dolfinus)\ *Handles different formats of MySQL JDBC URL, and produces datasets with consistent namespaces, like `mysql://host:port`.* - **Spark, Flink: add `OracleJdbcExtractor`** [`#2824`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2824) [@​dolfinus](https://redirect.github.com/dolfinus)\ *Handles simple Oracle JDBC URLs, like `oracle:thin:@​//host:port/serviceName` and `oracle:thin@host:port:sid`, and converts each to a dataset with namespace `oracle://host:port` and name `sid.schema.table` or `serviceName.schema.table`.* - **Spark: configurable test with Docker image provided** [`#2822`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2822) [@​pawel-big-lebowski](https://redirect.github.com/pawel-big-lebowski)\ *Extends the configurable integration test feature to enable getting the Docker image name as a name.* - **Spark: Support Iceberg 1.4 on Spark 3.5.1** [`#2838`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2838) [@​pawel-big-lebowski](https://redirect.github.com/pawel-big-lebowski)\ *Include Iceberg support for Spark 3.5. Fix column level lineage facet for `UNION` queries.* - **Spec: add example for change in `#2756`** [`#2801`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2801) [@​Sheeri](https://redirect.github.com/Sheeri)\ *Updates the `customLineage` facet test for the new syntax created in `#2756`.* ##### Changed - **Spark: fallback to `spark.sql.warehouse.dir` as table namespace** [`#2767`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2767) [@​dolfinus](https://redirect.github.com/dolfinus)\ *In cases when a metastore is not used, falls back to `spark.sql.warehouse.dir` or `hive.metastore.warehouse.dir` as table namespace, instead of duplicating the table's location.* ##### Fixed - **Java: handle dashes in hostname for `JdbcExtractors`** [`#2830`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2830) [@​dolfinus](https://redirect.github.com/dolfinus)\ *Proper handling of dashes in JDBC URL hosts.* - **Spark: fix Glue symlinks formatting bug** [`#2807`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2807) [@​Akash2351](https://redirect.github.com/Akash2351)\ *Fixes Glue symlinks with config parsing for Glue `catalogid`.* - **Spark, Flink: fix DBFS namespace format** [`#2800`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2800) [@​dolfinus](https://redirect.github.com/dolfinus)\ *Fixes the DBFS namespace format.* - **Spark: fix Glue naming format** [`#2766`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2766) [@​dolfinus](https://redirect.github.com/dolfinus)\ *Changes the AWS Glue namespace to match Glue ARN documentation.* - **Spark: fix Iceberg dataset location** [`#2797`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2797) [@​dolfinus](https://redirect.github.com/dolfinus)\ *Fixes Iceberg dataset namespace: instead of `file:/some/path/database.table` uses `file:/some/path/database/table`. For dataset TABLE symlink, uses warehouse location instead of database location.* - **Spark: fix NPE and incorrect comment** [`#2827`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2827) [@​pawel-big-lebowski](https://redirect.github.com/pawel-big-lebowski)\ *Fixes an error caused by a recent upgrade of Spark versions that did not break existing tests.* - **Spark: convert scheme and authority to lowercase in `JdbcLocation`** [`#2831`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2831) [@​dolfinus](https://redirect.github.com/dolfinus)\ *Converts valid JDBC URL scheme and authority to lowercase, leaving intact instance/database name, as different databases have different default case and case-sensitivity rules.* ### [`v1.17.1`](https://redirect.github.com/OpenLineage/OpenLineage/blob/HEAD/CHANGELOG.md#1171---2024-06-21) [Compare Source](https://redirect.github.com/OpenLineage/OpenLineage/compare/1.16.0...1.17.1) ##### Added - **Java: dataset namespace resolver feature** [`#2720`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2720) [@​pawel-big-lebowski](https://redirect.github.com/pawel-big-lebowski)\ *Adds a dataset namespace resolving mechanism that resolves dataset namespaces based on the resolvers configured. The core mechanism is implemented in openlineage-java and can be used within the Flink and Spark integrations.* - **Spark: add transformation extraction** [`#2758`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2758) [@​tnazarew](https://redirect.github.com/tnazarew)\ *Adds a transformation type extraction mechanism.* - **Spark: add GCP run and job facets** [`#2643`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2643) [@​codelixir](https://redirect.github.com/codelixir)\ *Adds `GCPRunFacetBuilder` and `GCPJobFacetBuilder` to report additional facets when running on Google Cloud Platform.* - **Spark: improve namespace format for SQLServer** [`#2773`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2773) [@​dolfinus](https://redirect.github.com/dolfinus)\ *Improves the namespace format for SQLServer.* - **Spark: verify jar content after build** [`#2698`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2698) [@​pawel-big-lebowski](https://redirect.github.com/pawel-big-lebowski)\ *Adds a tool to verify `shadowJar` content and prevent reported issues. These are hard to prevent currently and require manual verification of manually unpacked jar content.* - **Spec: add transformation type info** [`#2756`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2756) [@​tnazarew](https://redirect.github.com/tnazarew)\ *Adds information about the transformation type in `ColumnLineageDatasetFacet`. `transformationType` and `transformationDescription` are marked as deprecated.* - **Spec: implementing facet registry (following [#​2161](https://redirect.github.com/OpenLineage/OpenLineage/issues/2161))** [`#2729`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2729) [@​harels](https://redirect.github.com/harels)\ *Introduces the foundations of the new facet Registry into the repo.* - **Spec: register GCP common job facet** [`#2740`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2740) [@​ngorchakova](https://redirect.github.com/ngorchakova)\ *Registers the GCP job facet that contains common attributes that will improve the way lineage is parsed and displayed by the GCP platform. Based on the [proposal](https://redirect.github.com/OpenLineage/OpenLineage/pull/2228/files), GCP Lineage would like to define facets that are expected from integrations. The list of support facets is not final and will be extended further by next PR.* ##### Removed - **Java: remove deprecated `localServerId` option from Kafka config** [`#2738`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2738) [@​dolfinus](https://redirect.github.com/dolfinus)\ *Removes `localServerId` from Kafka config, deprecated since 1.13.0.* - **Java: remove deprecated `Transport.emit(String)`** [`#2737`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2737) [@​dolfinus](https://redirect.github.com/dolfinus)\ *Removes `Transport.emit(String)` support, deprecated since 1.13.0.* - **Spark: remove `spark-interfaces-scala` module** [`#2781`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2781)[@​ddebowczyk92](https://redirect.github.com/ddebowczyk92)\ *Replaces the existing `spark-interfaces-scala` interfaces with new ones decoupled from the Scala binary version. Allows for improved integration in environments where one cannot guarantee the same version of `openlineage-java`.* ##### Changed - **Spark: add log info when emitting lineage from Spark (following [#​2650](https://redirect.github.com/OpenLineage/OpenLineage/issues/2650))** [`#2769`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2769) [@​algorithmy1](https://redirect.github.com/algorithmy1)\ *Enhances logging.* ##### Fixed - **Flink: use `namespace.name` as Avro complex field type** [`#2763`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2763) [@​dolfinus](https://redirect.github.com/dolfinus)\ *`namespace.name` is now used as Avro `"type"` of complex fields (record, enum, fixed).* - **Java: repair empty dataset name** [`#2776`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2776) [@​kacpermuda](https://redirect.github.com/kacpermuda)\ *The dataset name should not be empty.* - **Spark: fix events emitted for `drop table` for Spark 3.4 and above** [`#2745`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2745) [@​pawel-big-lebowski](https://redirect.github.com/pawel-big-lebowski)[@​savannavalgi](https://redirect.github.com/savannavalgi)\ *Includes dataset being dropped within the event, as it used to be prior to Spark 3.4.* - **Spark, Flink: fix S3 dataset names** [`#2782`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2782) [@​dolfinus](https://redirect.github.com/dolfinus)\ *Drops the leading slash from the object storage dataset name. Converts `s3a://` and `s3n://` schemes to `s3://`.* - **Spark: fix Hive metastore namespace** [`#2761`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2761) [@​dolfinus](https://redirect.github.com/dolfinus)\ *Fixes the dataset namespace for cases when the Hive metastore URL is set using `$SPARK_CONF_DIR/hive-site.xml`.* - **Spark: fix NPE in column-level lineage** [`#2749`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2749) [@​pawel-big-lebowski](https://redirect.github.com/pawel-big-lebowski)\ *The Spark agent now checks to determine if `cur.getDependencies()` is not null before adding dependencies.* - **Spark: refactor `OpenLineageRunEventBuilder`** [`#2754`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2754) [@​pawel-big-lebowski](https://redirect.github.com/pawel-big-lebowski)\ *Adds a separate class containing all the input arguments to call `OpenLineageRunEventBuilder::buildRun`.* - **Spark: fix `historyUrl` format** [`#2741`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2741) [@​dolfinus](https://redirect.github.com/dolfinus)\ *Fixes the `historyUrl` format in `spark_applicationDetails`.* - **SQL: allow self-recursive aliases** [`#2753`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2753) [@​mobuchowski](https://redirect.github.com/mobuchowski)\ *Expressions like `select * from test_orders as test_orders` are now parsed properly.* ### [`v1.16.0`](https://redirect.github.com/OpenLineage/OpenLineage/blob/HEAD/CHANGELOG.md#1160---2024-05-28) [Compare Source](https://redirect.github.com/OpenLineage/OpenLineage/compare/1.15.0...1.16.0) ##### Added - **Spark: add `jobType` facet to Spark application events** [`#2719`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2719) [@​dolfinus](https://redirect.github.com/dolfinus)\ *Add `jobType` facet to `runEvent`s emitted by `SparkListenerApplicationStart`.* - **Spark & Flink: Introduce dataset namespace resolver.** [`#2720`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2720) [@​pawel-big-lebowski](https://redirect.github.com/pawel-big-lebowski)\ *Enable resolving dataset namespace with predefined resolvers like: `HostListNamespaceResolver`, `PatternNamespaceResolver`, `PatternMatchingGroupNamespaceResolver` or custom implementation loaded with ServiceLoader. Feature is useful to resolve hostnames into cluster identifiers.* ##### Fixed - **dbt: fix swapped namespace and name in dbt integration** [`#2735`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2735) [@​JDarDagran](https://redirect.github.com/JDarDagran)\ *Fixes variable names.* - **Python: override debug level** [`#2727`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2735) [@​mobuchowski](https://redirect.github.com/mobuchowski)\ *Removes debug-level logging of HTTP requests.* ### [`v1.15.0`](https://redirect.github.com/OpenLineage/OpenLineage/blob/HEAD/CHANGELOG.md#1150---2024-05-23) [Compare Source](https://redirect.github.com/OpenLineage/OpenLineage/compare/1.14.0...1.15.0) ##### Added - **Flink: handle Iceberg tables with nested and complex field types** [`#2706`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2706) [@​dolfinus](https://redirect.github.com/dolfinus)\ *Creates `SchemaDatasetFacet` with nested fields for Iceberg tables with list, map and struct columns.* - **Flink: handle Avro schema with nested and complex field types** [`#2711`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2711) [@​dolfinus](https://redirect.github.com/dolfinus)\ *Creates `SchemaDatasetFacet` with nested fields for Avro schemas with complex types (union, record, map, array, fixed).* - **Spark: add facets to Spark application events** [`#2677`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2677) [@​dolfinus](https://redirect.github.com/dolfinus)\ *Adds support for Spark application start and stop events in the `ExecutionContext` interface.* - **Spark: add nested fields to `SchemaDatasetFieldsFacet`** [`#2689`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2689) [@​dolfinus](https://redirect.github.com/dolfinus)\ *Adds nested Spark Dataframe fields support to `SchemaDatasetFieldsFacet`. Also include field comment as `description`.* - **Spark: add `SparkApplicationDetailsFacet`** [`#2688`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2688) [@​dolfinus](https://redirect.github.com/dolfinus)\ *Adds `SparkApplicationDetailsFacet` to `runEvent`s emitted on Spark application start.* ##### Removed - **Airflow: remove Airflow < 2.3.0 support** [`#2710`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2710) [@​kacpermuda](https://redirect.github.com/kacpermuda)\ *Removes Airflow < 2.3.0 support.* - **Integration: use v2 Python facets** [`#2693`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2693) [@​JDarDagran](https://redirect.github.com/JDarDagran)\ *Migrates integrations from removed v1 facets to v2 Python facets.* ##### Fixed - **Spark: improve job suffix assigning mechanism** [`#2665`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2665) [@​pawel-big-lebowski](https://redirect.github.com/pawel-big-lebowski)\ *For some catalog handlers, the mechanism was creating different dataset identifiers on START and COMPLETE depending on whether a dataset was created or not. This improves the mechanism to assign a deterministic job suffix based on the output dataset at the moment of a start event. **Note**: this may change job names in some scenarios.* - **Airflow: fix empty dataset name for `AthenaExtractor`** [`#2700`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2700) [@​kacpermuda](https://redirect.github.com/kacpermuda)\ *The dataset name should not be empty when passing only a bucket as S3 output in Athena.* - **Flink: fix `SchemaDatasetFacet` for Protobuf repeated primitive types** [`#2685`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2685) [@​dolfinus](https://redirect.github.com/dolfinus)\ *Fixes issues with the Protobuf schema converter.* - **Python: clean up Python client code, add logging.** [`#2653`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2653) [@​kacpermuda](https://redirect.github.com/kacpermuda)\ *Cleans up client code, refactors logging in all Python modules.* - **SQL: catch `TokenizerError`s, `PanicException`** [`#2703`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2703) [@​mobuchowski](https://redirect.github.com/mobuchowski)\ *The SQL parser now catches and handles these errors.* - **Python: suppress warning on importing v1 module in **init**.py.** [`#2713`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2713) [@​JDarDagran](https://redirect.github.com/JDarDagran)\ *Suppresses the deprecation warning when v1 facets are used.* - **Integration/Java/Python: use UUIDv7 instead of UUIDv4** [`#2686`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2686) [`#2687`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2687) [@​dolfinus](https://redirect.github.com/dolfinus)\ *Uses UUIDv7 instead of UUIDv4 for `runEvent`s. The new UUID version produces monotonically increasing values, which leads to more performant queries on the OL consumer side. **Note**: UUID version is an implementation detail and can be changed in the future.* ### [`v1.14.0`](https://redirect.github.com/OpenLineage/OpenLineage/blob/HEAD/CHANGELOG.md#1140---2024-05-09) [Compare Source](https://redirect.github.com/OpenLineage/OpenLineage/compare/1.13.1...1.14.0) ##### Added - **Common/dbt: add DREMIO to supported dbt profile types** [`#2674`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2674) [@​surisimran](https://redirect.github.com/surisimran)\ \*Adds support for dbt-dremio, resolving [`#2668`](https://redirect.github.com/OpenLineage/OpenLineage/issues/2668). - **Flink: support Protobuf format for sources and sinks** [`#2482`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2482) [@​pawel-big-lebowski](https://redirect.github.com/pawel-big-lebowski)\ *Adds schema extraction from Protobuf classes. Includes support for nested object types, `array` type, `map` type, `oneOf` and `any`.* - **Java: add facet conversion test** [`#2663`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2663) [@​julienledem](https://redirect.github.com/julienledem)\ *Adds a simple test that shows how to deserialize a facet in the server model.* - **Spark: job type facet to distinguish RDD jobs from Spark SQL jobs** [`#2652`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2652) [@​pawel-big-lebowski](https://redirect.github.com/pawel-big-lebowski)\ *Sets the `jobType` property of `JobTypeJobFacet` to either `SQL_JOB` or `RDD_JOB`.* - **Spark: add Glue symlink if reading from Glue catalog table** [`#2646`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2646) [@​mobuchowski](https://redirect.github.com/mobuchowski)\ *The dataset symlink now points to the Glue catalog table name if the Glue catalog table is used.* - **Spark: add `spark_jobDetails` facet** [`#2662`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2662) [@​dolfinus](https://redirect.github.com/dolfinus)\ *Adds a `SparkJobDetailsFacet`, capturing information about Spark application jobs -- e.g. `jobId`, `jobDescription`, `jobGroup`, `jobCallSite`. This allows for tracking an OpenLineage `RunEvent` with a specific Spark job in SparkUI.* ##### Removed - **Airflow: drop old `ParentRunFacet` key** [`#2660`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2660) [@​dolfinus](https://redirect.github.com/dolfinus)\ *Changes the integration to use the `parent` key for `ParentFacet`, dropping the outdated `parentRun`.* - **Spark: drop `SparkVersionFacet`** [`#2659`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2659) [@​dolfinus](https://redirect.github.com/dolfinus)\ *Drops the `SparkVersion` facet, deprecated since 1.2.0 and planned for removal since 1.4.0.* - **Python: allow relative paths in URI formats for Python facets** [`#2679`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2679) [@​JDarDagran](https://redirect.github.com/JDarDagran)\ *Removes a URI validator that checked if scheme and netloc were present, allowing relative paths in URI formats for Python facets.* ##### Changed - **GreatExpectations: rename `ParentRunFacet` key** [`#2661`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2661) [@​dolfinus](https://redirect.github.com/dolfinus)\ *The OpenLineage spec defined the `ParentRunFacet` with the property name parent but the Great Expectations integration created a lineage event with `parentRun`. This renames `ParentRunFacet` key from `parentRun` to `parent`. For backwards compatibility, keep the old name.* ##### Fixed - **dbt: support a less ambiguous logic to generate job names** [`#2658`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2658) [@​blacklight](https://redirect.github.com/blacklight)\ *Includes profile and models in the dbt job name to make it more unique.* - **Spark: update to use `org.apache.commons.lang3` instead of `org.apache.commons.lang`** [`#2676`](https://redirect.github.com/OpenLineage/OpenLineage/pull/2676) [@​harels](https://redirect.github.com/harels)\ *Updates Apache Commons Lang to the latest version. We were mixing two versions, and the old one was not present in many places.*

Configuration

πŸ“… Schedule: Branch creation - "every 3 months on the first day of the month" (UTC), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

β™» Rebasing: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

πŸ”• Ignore: Close this PR and you won't be reminded about this update again.



This PR was generated by Mend Renovate. View the repository job log.

netlify[bot] commented 1 month ago

Deploy Preview for peppy-sprite-186812 canceled.

Name Link
Latest commit 17d596b6c640dda5931ce1847762099ed410fee1
Latest deploy log https://app.netlify.com/sites/peppy-sprite-186812/deploys/671ab4a4b8f1d200086d8648
codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 81.16%. Comparing base (42d4081) to head (17d596b). Report is 1 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #2907 +/- ## ========================================= Coverage 81.16% 81.16% Complexity 1506 1506 ========================================= Files 268 268 Lines 7363 7363 Branches 329 329 ========================================= Hits 5976 5976 Misses 1226 1226 Partials 161 161 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.