ORC-1251: Use Hadoop Vectored IO

apache / orc

Apache ORC - the smallest, fastest columnar storage for Hadoop workloads

https://orc.apache.org/

Apache License 2.0

689 stars 483 forks source link

ORC-1251: Use Hadoop Vectored IO #1708

Closed williamhyun closed 10 months ago

williamhyun commented 10 months ago

What changes were proposed in this pull request?

This PR aims to use Hadoop Vectored IO always in Apache ORC 2.0.0.

Why are the changes needed?

Apache ORC 2.0.0 is ready to use this new Hadoop feature.

1509
1554
Hadoop Vectored IO Presentation

Works great everywhere; radical benefit in object stores

How was this patch tested?

Pass the CIs.

williamhyun commented 10 months ago

cc: @wgtmac @dongjoon-hyun @HarshitGupta11 @mukund-thakur @steveloughran @jerqi

dongjoon-hyun commented 10 months ago

I added Milestone v2.0.0.

dongjoon-hyun commented 10 months ago

I fixed the checkstyle issue.

dongjoon-hyun commented 10 months ago

Given that the patch size is small, we can test more after merging. Feel free to merge, @williamhyun .

dongjoon-hyun commented 10 months ago

Let me merge this with the following authorship.

Lead-authored-by: William Hyun <william@apache.org>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: HarshitGupta11 <harshit.gupta@cloudera.com>

mukund-thakur commented 10 months ago

Thanks, everyone for finishing this up.

apache / orc

ORC-1251: Use Hadoop Vectored IO #1708

What changes were proposed in this pull request?

Why are the changes needed?

1509

1554

How was this patch tested?