-
deployment region - ap-northeast-2
~~1. In the template, Service Role for EMR does not have KMS policy. Thus SecurityConfiguration makes below problem.~~
~~The EMR Service Role must have the k…
ghost updated
3 years ago
-
Now that S3 is strongly consistency by default across regions.
Amazon S3 Update – Strong Read-After-Write Consistency
https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-read-after-write-cons…
-
Add synchronization feature when EMRFS metastore enabled, e.g. having parquet files created by EMR and trying deletion from S3 with AWS Lambda (or local boto app).
-
Hudi MoR reading performance gets slower on tables with many (1000+) partitions stored in S3. When running simple ```spark.sql("select * from table_ro).count``` command, we observe in spark UI that f…
-
We could add streaming support to `PrestoS3OutputStream` by changing it to manually perform a multi-part upload that runs for the duration of the write operation. Rather than having a single temporary…
-
## Background
An interesting point has been raised by Tony in regard of the S3 PoC file access, currently written in terms of AWS SDK for S3.
> Any reason for going through the pain of lower level A…
-
Hi,
I am using DataSourceWriter for HUDI compaction. I populated 30Gb table with around 1 billion rows. It created around 6000 partitions each file of size ranging 2mb - 100mb. I tried upserting 1Gb …
-
Spark structured Streaming writes to Hudi and synchronizes Hive to create only read-optimized tables without creating real-time tables , no errors happening
**Environment Description**
* …
wosow updated
3 years ago
-
@yruslan
As of now, it is easy to specify local file system paths or hdfs paths for the copybooks or the input EBCDIC files. As i am running it from EMR, i tried setting up the S3 filesystem in my …
-
Hi,
I'm having a trouble using Apache Hudi with S3.
**Steps to reproduce the behavior:**
1. Produce messages to topic Kafka. (2000 records per window on average)
2. Start streaming (sample c…