USCDataScience / sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
http://irds.usc.edu/sparkler/
Apache License 2.0
412 stars 143 forks source link

SPARKLER-236 Exclude net.jpountz.lz4 lz4 from kafka-clients dependency in sparkler-app/pom.xml #237

Closed lewismc closed 2 years ago

lewismc commented 3 years ago

What changes were proposed in this pull request?

This PR addresses the issue reported in https://github.com/USCDataScience/sparkler/issues/236

Is this related to an already existing issue on sparkler?
https://github.com/USCDataScience/sparkler/issues/236

Will it close an existing issue?
Closes https://github.com/USCDataScience/sparkler/issues/236

How was this patch tested?

% cd sparkler-core && mvn clean install -DskipTests
% docker build -t sparkler-local -f sparkler-deployment/docker/Dockerfile .
% docker run -p 8983:8983 -p 4041:4040 -it --user sparkler -d 662c89b5cfb8
% docker exec -it --user sparkler 83ba4d5fc174418bcd8eeee78618c6fa45de2a79398000e5954f0600a45d14a1 /bin/bash
% /data/solr/bin/solr start -force
% /data/sparkler/bin/sparkler.sh inject -id 1 -su "https://www.lemonde.fr/"
% /data/sparkler/bin/sparkler.sh crawl -id 1 -tn 10 -i 2

... now the crawl executes and finishes without any issues.