delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.62k stars 1.71k forks source link

[HIVE 4.0] Delta Lake - Hive4 compatibility changes #3727

Open userhimanshuverma opened 1 month ago

userhimanshuverma commented 1 month ago

Which Delta project/connector is this regarding?

Description

This pull request adds compatibility for Delta Lake with Apache Hive 4.0. This enhancement enables Hive 4.0 users to create delta tables.

Key Changes:

Why We Need This Change:

Integrating Delta Lake with Hive 4.0 bridges the gap between Hive users and the powerful features of Delta Lake. This change simplifies data workflows, allowing users to leverage Delta Lake’s robust features without leaving the Hive environment.

Resolved Issues:

This PR resolves any outstanding issues related to the lack of Delta Lake compatibility in Hive 4.0.

How was this patch tested?

Does this PR introduce any user-facing changes?

Yes, this PR introduces user-facing changes:

These changes enhance the existing capabilities of Hive, making it more versatile in managing large datasets with Delta Lake's advanced features.

himanshuacceldata commented 1 month ago

Hi @olaky, Could you please review this pull request and let me know if any changes are required.

userhimanshuverma commented 1 month ago

@marmbrus @grundprinzip @rtyler Could you please review this PR.

userhimanshuverma commented 1 month ago

Hi @olaky , Thank you for your feedback regarding my PR for adding Hive 4 support in Delta Lake. I understand your concern that the current changes appear to re-declare Hive 3 as Hive 4, rather than properly adding support for Hive 4. I appreciate your insights, and I'd like to clarify the intent and the proposed solution.

Key Points:

  1. Current Changes:

    • The PR was intended to enable compatibility with Apache Hive 4.0, alongside Hadoop 3.3.6, allowing users to create Delta tables directly within Hive 4.0.
  2. Version Management:

    • I recognize the need to maintain support for Hive 3 while adding functionality for Hive 4. My initial approach modified existing structures, but I see that this could lead to losing support for Hive 3.
  3. Proposed Solution:

    • To address your concern, I propose creating a separate folder structure inside the connector specifically for Hive 4 support. This way, we can keep the existing Hive 3 integration intact while adding the necessary code to support Hive 4.
  4. Next Steps:

    • If this approach is acceptable, I will create a new PR that organizes the code into a dedicated Hive 4 folder within the connector. This will ensure clarity and maintainability of both versions.

Thank you for your guidance, and I'm looking forward to your thoughts on this proposed solution.

olaky commented 1 month ago

Hi, I did some more browsing, and at least per documentation Spark does not Support Hive version 4 yet. Did you validate that this really works?

userhimanshuverma commented 1 month ago

Hi @olaky,

Thank you for your follow-up. I want to clarify that this PR is focused on enabling the creation of Delta tables directly through the Hive shell, rather than through the Spark shell.

Regarding Spark, I have tested version 3.5 with Hive 4.0.0 and validated that CRUD operations (create, read, update, delete) work successfully when Delta jars are added to the Spark library. However, the goal here is to provide support for creating Delta tables in the Hive shell itself, independent of Spark.

Currently, in Hive 3, it is possible to create Delta tables by building the Delta Uber jar. However, I encountered issues with Hive 4, which required adjustments to enable table creation through the Hive shell. The changes in this PR are meant to address these issues and provide Hive 4 compatibility for Delta tables.

I hope this clarification helps, and I look forward to your feedback on the proposed changes.

userhimanshuverma commented 1 month ago

Sure, I will make the changes so that it won't affect.