Open Sheth-G opened 2 weeks ago
Relevant Sources:
https://hudi.apache.org/docs/cli
https://apache-hudi.slack.com/archives/C4D716NPQ/p1713623081205589
https://apache-hudi.slack.com/archives/C4D716NPQ/p1712911987322839
https://apache-hudi.slack.com/archives/C4D716NPQ/p1712942649817529
https://apache-hudi.slack.com/archives/C4D716NPQ/p1712158401061929 >torvalds.dev is learning and improving. React below to provide feedback! 👍 0 button 👎 0 button
Hello! 👋 Please wait while I search the knowledge base for an answer; this usually takes about 30 seconds
Follow up questions:
Sources:
To install Apache Hudi on AWS EMR, you need to follow these steps:
Use EMR version 5.30.1 with Spark 2.4.5 and Hudi 0.5.2, as they come pre-installed, and execute a spark-submit command passing necessary jars and configurations. Find more details at here.
Create a Hudi table within PySpark without a separate create table command by writing to the table for the first time. Set up the table name, base path, and a data generator to generate records. More info available here.
Local setup of the Hudi CLI using the hudi-cli-bundle for Spark3 can be done by following the requirements and script provided here.
These steps will guide you through the installation process of Apache Hudi and setting up tables for future querying. Happy coding! >torvalds.dev is learning and improving. React below to provide feedback! 👍 0 button 👎 0 button
works
works as well
Describe the problem you faced Sharvil posted asking about the installation process for Hudi.
Environment Description
Hudi version :
Spark version :
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS..) :
Running on Docker? (yes/no) :
Additional context
Add any other context about the problem here.
Stacktrace
Add the stacktrace of the error.