apache / orc

Apache ORC - the smallest, fastest columnar storage for Hadoop workloads
https://orc.apache.org/
Apache License 2.0
675 stars 480 forks source link

ORC-1635: Download orc-format from dlcdn.apache.org instead of archive.apache.org #1820

Closed progval closed 6 months ago

progval commented 6 months ago

What changes were proposed in this pull request?

Download orc-format from dlcdn.apache.org instead of archive.apache.org

Why are the changes needed?

https://archive.apache.org/ discourages heavy use, and its rate limits can cause CI systems building Apache ORC to be banned.

How was this patch tested?

It builds from a clean repo

Was this patch authored or co-authored using generative AI tooling?

no

deshanxiao commented 6 months ago

Thanks @progval Pending CI.

dongjoon-hyun commented 6 months ago

Here are the example. You can see that 404 Error for Apache Spark 3.5.0.

$ curl --head https://dlcdn.apache.org/spark/spark-3.5.1/spark-3.5.1.tgz
HTTP/2 200
server: Apache
last-modified: Thu, 15 Feb 2024 11:39:51 GMT
etag: "21ae2b9-6116a15e24d57"
access-control-allow-origin: *
content-type: application/x-gzip
via: 1.1 varnish, 1.1 varnish
accept-ranges: bytes
age: 21
date: Mon, 26 Feb 2024 17:48:04 GMT
x-served-by: cache-hel1410020-HEL, cache-sjc10040-SJC
x-cache: MISS, HIT
x-cache-hits: 0, 0
x-timer: S1708969685.562024,VS0,VE45
content-length: 35316409
$ curl --head https://dlcdn.apache.org/spark/spark-3.5.0/spark-3.5.0.tgz
HTTP/2 404
server: Apache
content-type: text/html; charset=iso-8859-1
via: 1.1 varnish, 1.1 varnish
accept-ranges: bytes
date: Mon, 26 Feb 2024 17:48:11 GMT
age: 0
x-served-by: cache-hel1410022-HEL, cache-sjc1000130-SJC
x-cache: MISS, MISS
x-cache-hits: 0, 0
x-timer: S1708969691.332510,VS0,VE511
content-length: 196
dongjoon-hyun commented 6 months ago

Let me close this to prevent any accidental merging first.

Feel free to reopen this if you have any other ways, @progval .

douardda commented 6 months ago

Maybe we could put both URLs since the ExternalProject URL entry allows several entries.

dongjoon-hyun commented 6 months ago

Maybe we could put both URLs since the ExternalProject URL entry allows several entries.

+1 for the suggestion, @douardda . You want to put dlcdn first as a cache, right?

douardda commented 6 months ago

Maybe we could put both URLs since the ExternalProject URL entry allows several entries.

+1 for the suggestion, @douardda . You want to put dlcdn first as a cache, right?

yes

progval commented 6 months ago

Submitted at https://github.com/apache/orc/pull/1830

douardda commented 6 months ago

Submitted at #1830

thanks!