elastic / elasticsearch-hadoop

:elephant: Elasticsearch real-time search and analytics natively integrated with Hadoop
https://www.elastic.co/products/hadoop
Apache License 2.0
1.93k stars 989 forks source link

Unable to build the newly cloned project due to invalid dependency paths #2197

Open pstysz opened 7 months ago

pstysz commented 7 months ago

What kind an issue is this?

Issue description

After cloning the repository (it's important to note here that I didn't have any of the required libraries locally and all had to be downloaded), when attempting to build the project, Gradle is unable to resolve all dependencies. The issues that (I believe) I managed to identify are:

  1. In the BaseBuildPlugin class, there is a link to a repository that is no longer publicly available: https://repo.spring.io/plugins-release-local. According to the information provided on the Spring website, they have stopped making this repository publicly available. Dependencies that should be downloaded from this repository are not downloaded during the build. Since other repositories are also specified in the same file, the script tries to download from them and it works for most dependencies, but...
  2. The HadoopFixturePlugin class, through the apply and configureApacheMirrorRepository methods, creates URLs (for the purpose of downloading Hadoop and Spark libraries) that are not likely to work on any Maven repository used in the BaseBuildPlugin class. Below, in the "code" section, I provide examples of such paths. There, one correct link is created, to the repository https://apache.osuosl.org/, which could work because the path is correctly constructed but...
  3. The project uses versions that have already been removed from this repository and these links, to the only repository that could work, also do not allow for the downloading of dependencies. I am referring here to the version hadoop-common:3.3.2 and spark-bin-hadoop3:3.3.3. And now the question arises, where did these versions come from, after all, in the gradle.properties file for the project, other versions are clearly specified. This part took me the longest - in the SparkYarnServiceDescriptor and HadoopServiceDescriptor classes, these versions are hardcoded as defaultVersion and this value is never overwritten, meaning that during the build, the default version specified in these classes is always downloaded.
  4. The problem can be partially circumvented by installing dependencies into the local Maven repository, but then another problem arises - HadoopFixturePlugin changes the default groupId, artifactId, and version, so when installing a given dependency, we also need to change the corresponding properties in the pom file, otherwise, we get a mismatch error. 4.* (I'm not sure about this, unfortunately, I don't have enough time at the moment to investigate further) It seems to me that the extract task, in hadoopFixture, is always executed, regardless of whether the dependency actually requires unpacking, i.e., if it were possible to find the dependency in another repository than https://apache.osuosl.org/ (e.g., in the form of a pom), the script would still try to perform the extract.

As a hotfix, I changed the defaultVersion in the classes: SparkYarnServiceDescriptor (to 3.4.2) and HadoopServiceDescriptor (to 3.3.6). Such versions exist in the repository https://apache.osuosl.org/ so it is possible to download them.

Steps to reproduce

  1. Clone repository
  2. Build project with command:

Code:

gradle build -x test

Strack trace:

(...)
> Configure project :test:fixtures:minikdc
Evaluating project ':test:fixtures:minikdc' using build file '/Users/pstysz/git/elasticsearch-hadoop/test/fixtures/minikdc/build.gradle'.
All projects evaluated.
Task name matched 'clean'
Task name matched 'build'
Selected primary task 'clean' from project :
Selected primary task 'build' from project :
Resource missing. [HTTP GET: https://repo.maven.apache.org/maven2/hadoop/common/hadoop-3.3.2/hadoop-3.3.2/hadoop-3.3.2-hadoop-3.3.2.pom]
Resource missing. [HTTP GET: https://repo.clojars.org/hadoop/common/hadoop-3.3.2/hadoop-3.3.2/hadoop-3.3.2-hadoop-3.3.2.pom]
Resource missing. [HTTP GET: https://snapshots.elastic.co/maven/hadoop/common/hadoop-3.3.2/hadoop-3.3.2/hadoop-3.3.2-hadoop-3.3.2.pom]
Resource missing. [HTTP GET: https://oss.sonatype.org/content/repositories/snapshots/hadoop/common/hadoop-3.3.2/hadoop-3.3.2/hadoop-3.3.2-hadoop-3.3.2.pom]
Resource missing. [HTTP GET: https://artifacts.elastic.co/maven/hadoop/common/hadoop-3.3.2/hadoop-3.3.2/hadoop-3.3.2-hadoop-3.3.2.pom]
Resource missing. [HTTP GET: https://repo1.maven.org/maven2/hadoop/common/hadoop-3.3.2/hadoop-3.3.2/hadoop-3.3.2-hadoop-3.3.2.pom]
Resource missing. [HTTP GET: https://snapshots.elastic.co/downloads/elasticsearch/hadoop-3.3.2-hadoop-3.3.2.xml]
Resource missing. [HTTP GET: https://artifacts.elastic.co/downloads/elasticsearch/hadoop-3.3.2-hadoop-3.3.2.xml]
Resource missing. [HTTP HEAD: https://apache.osuosl.org/hadoop/common/hadoop-3.3.2/hadoop-3.3.2.tar.gz]

FAILURE: Build failed with an exception.

* What went wrong:
Could not determine the dependencies of task ':qa:kerberos:hadoopFixture#datanode.extract'.
> Could not resolve all files for configuration ':qa:kerberos:downloadHadoop#3.3.2'.
   > Could not find hadoop.common:hadoop-3.3.2:hadoop-3.3.2.
     Searched in the following locations:
       - https://repo.maven.apache.org/maven2/hadoop/common/hadoop-3.3.2/hadoop-3.3.2/hadoop-3.3.2-hadoop-3.3.2.pom
       - https://clojars.org/repo/hadoop/common/hadoop-3.3.2/hadoop-3.3.2/hadoop-3.3.2-hadoop-3.3.2.pom
       - https://snapshots.elastic.co/maven/hadoop/common/hadoop-3.3.2/hadoop-3.3.2/hadoop-3.3.2-hadoop-3.3.2.pom
       - https://oss.sonatype.org/content/repositories/snapshots/hadoop/common/hadoop-3.3.2/hadoop-3.3.2/hadoop-3.3.2-hadoop-3.3.2.pom
       - https://artifacts.elastic.co/maven/hadoop/common/hadoop-3.3.2/hadoop-3.3.2/hadoop-3.3.2-hadoop-3.3.2.pom
       - https://oss.sonatype.org/content/groups/public/hadoop/common/hadoop-3.3.2/hadoop-3.3.2/hadoop-3.3.2-hadoop-3.3.2.pom
       - https://snapshots.elastic.co/downloads/elasticsearch/hadoop-3.3.2-hadoop-3.3.2.xml
       - https://artifacts.elastic.co/downloads/elasticsearch/hadoop-3.3.2-hadoop-3.3.2.xml
       - https://apache.osuosl.org/hadoop/common/hadoop-3.3.2/hadoop-3.3.2.tar.gz
     Required by:
         project :qa:kerberos

Version Info

OS: : MacOS Sonoma 14.2.1 (23C71) aarch64 JVM : 17.0.10 (Azul Systems, Inc. 17.0.10+7-LTS) Hadoop/Spark : 3.4.2 ES-Hadoop : 8.12 ES : -

Gradle : 8.5 Groovy : 3.0.17