[SUPPORT] Installation process for Hudi

Sheth-G commented 2 weeks ago

Describe the problem you faced Sharvil posted asking about the installation process for Hudi.

Environment Description

Hudi version :
Spark version :
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS..) :
Running on Docker? (yes/no) :

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

torvalds-dev-testbot[bot] commented 2 weeks ago

Relevant Sources:

https://hudi.apache.org/docs/cli

https://apache-hudi.slack.com/archives/C4D716NPQ/p1713623081205589

https://apache-hudi.slack.com/archives/C4D716NPQ/p1712911987322839

https://apache-hudi.slack.com/archives/C4D716NPQ/p1712942649817529

https://apache-hudi.slack.com/archives/C4D716NPQ/p1712158401061929 >torvalds.dev is learning and improving. React below to provide feedback! 👍 0 button 👎 0 button

torvalds-dev-testbot[bot] commented 2 weeks ago

Hello! 👋 Please wait while I search the knowledge base for an answer; this usually takes about 30 seconds

torvalds-dev-testbot[bot] commented 2 weeks ago

Follow up questions:

Could you describe the version of Hudi and other tech stack in your environment? If you are facing an error, please provide a full stack trace and describe what operations led to the errors. Some other artifacts that are useful to triage would be Spark UI screenshots.
Can you provide more details about the specific steps you have taken to install Hudi? This will help in understanding the context better.
Have you encountered any challenges or errors during the installation process? If so, please share the specific error messages or issues you faced.
Are you looking to install Hudi on a specific platform or in a particular cluster setup?
Do you require assistance with any particular configuration settings or dependencies for the Hudi installation?

Sources:

https://medium.com/@ritik20023/quickly-start-using-apache-hudi-on-aws-emr-de432c01e488
https://hudi.apache.org/docs/cli
https://medium.com/blue-orange-digital/introduction-to-apache-hudi-209521970112
https://www.linkedin.com/pulse/hudi-streamer-delta-hands-on-guide-local-ingestion-from-soumil-shah-jssse/?utm_source=share&utm_medium=member_ios&utm_campaign=share_via
https://hudi.apache.org/docs/quick-start-guide >torvalds.dev is learning and improving. React below to provide feedback! 👍 0 button 👎 0 button

torvalds-dev-testbot[bot] commented 2 weeks ago

To install Apache Hudi on AWS EMR, you need to follow these steps:

Use EMR version 5.30.1 with Spark 2.4.5 and Hudi 0.5.2, as they come pre-installed, and execute a spark-submit command passing necessary jars and configurations. Find more details at here.
Create a Hudi table within PySpark without a separate create table command by writing to the table for the first time. Set up the table name, base path, and a data generator to generate records. More info available here.
Local setup of the Hudi CLI using the hudi-cli-bundle for Spark3 can be done by following the requirements and script provided here.

These steps will guide you through the installation process of Apache Hudi and setting up tables for future querying. Happy coding! >torvalds.dev is learning and improving. React below to provide feedback! 👍 0 button 👎 0 button

Sheth-G commented 2 weeks ago

works

torvalds-dev-testbot[bot] commented 2 weeks ago

works as well

Sheth-G / test-repo-app

[SUPPORT] Installation process for Hudi #41