gbif / stackable

GBIF Stackable Infrastructure
Apache License 2.0
4 stars 0 forks source link

Setup NFS gateway for the hadoop cluster for exposing the downloads folder in hadoop #18

Closed zaultooz closed 6 months ago

zaultooz commented 1 year ago

A look into the documentation on https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html suggest that an existing Hadoop HDFS cluster could be configured to expose a folder to a NFS drive.

Changes would be required in https://github.com/gbif/stackable/blob/master/charts/gbif-hdfs-cluster/templates/hdfs-cluster.yaml where we properly will have to add the properties under configOverrides section in the file, release the new version of the chart and apply a rolling restart on the hdfs cluster.

MattBlissett commented 1 year ago

The NFS gateway is currently required since the WebHDFS gateway doesn't support HTTP Range requests.

I can't see immediately, but if a newer version of WebHDFS supports the HTTP Range header, that can be used instead of NFS.

zaultooz commented 6 months ago

Currently WebHDFS still doesn't support a usage of HTTP range as it is currently used. Therefore both a NFS and web HDFS has been configured.

The charts for webHDFS can be found here: https://github.com/gbif/stackable/tree/master/charts/gbif-hadoop-httpfs.

The changes for activating the NFS should be in the HDFS charts.

It was done quite some time ago but I forgot to add the info here.