big-data-europe / docker-hadoop

Apache Hadoop docker image
2.18k stars 1.27k forks source link

`base` image can not be built. #148

Open gogagum opened 1 year ago

gogagum commented 1 year ago

When I try building base hadoop image, I get errors from apt in debian:

W: The repository 'http://security.debian.org/debian-security stretch/updates Release' does not have a Release file.
W: The repository 'http://deb.debian.org/debian stretch Release' does not have a Release file.
W: The repository 'http://deb.debian.org/debian stretch-updates Release' does not have a Release file.
E: Failed to fetch http://security.debian.org/debian-security/dists/stretch/updates/main/binary-amd64/Packages  404  Not Found [IP: 151.101.66.132 80]
E: Failed to fetch http://deb.debian.org/debian/dists/stretch/main/binary-amd64/Packages  404  Not Found
E: Failed to fetch http://deb.debian.org/debian/dists/stretch-updates/main/binary-amd64/Packages  404  Not Found
E: Some index files failed to download. They have been ignored, or old ones used instead.

Base image needs to be changed from debian, or something should be done with debian repositories.

DrLarck commented 6 months ago

Hey!

Got the same problem, the base image cannot be built using make build.

As a workaround, I did some changes in the Makefile and pulled the base image manually. Here's what I did:

In Makefile from master:

Then, run make build. It should now build images with the appropriate tag.

Finally, run the following docker command to pull a working base image:

$ docker pull bde2020/hadoop-base:2.0.0-hadoop3.2.1-java8
usersina commented 6 months ago

That's because debian 9 (stretch) is not maintained anymore. You can update the image to debian:bullseye-slim instead.

You also need to use openjdk-11 instead since openjdk-8 is not in the debian packages list anymore.

Additonally, you need to use https://archive.apache.org for the hadoop URL. (Versions available here)

All in all, here are the changes:

FROM debian:bullseye-slim

RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
    openjdk-11-jdk \
    net-tools \
    curl \
    netcat \
    gnupg \
    libsnappy-dev \
    && rm -rf /var/lib/apt/lists/*

ENV JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/

...

ENV HADOOP_URL https://archive.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz

Anyhow, I did fork this repo since I need it for a small lab demo. The setup is more streamlined there and is as easy as running

# Build the cluster
make build

# Run the cluster
make up

# Get a shell into the hadoop cluster
make shell