issues
search
USCDataScience
/
sparkler
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
http://irds.usc.edu/sparkler/
Apache License 2.0
411
stars
143
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Make plugins as dynamically loadable services (no compile time dependency)
#19
thammegowda
closed
8 years ago
1
Add Regex URLFilter Plugin
#18
thammegowda
closed
8 years ago
0
Create a plugin framework for Sparkler
#17
smadha
closed
8 years ago
1
Debugging crawl in Sparkler
#16
karanjeets
closed
6 years ago
3
Build failing
#15
smadha
closed
8 years ago
1
base package "usc.edu.ir" instead of "usc.edu.irds"
#14
smadha
closed
8 years ago
5
Add XML-based Sparkler Configuration
#13
karanjeets
closed
8 years ago
2
Setup URL normalizers
#12
thammegowda
closed
3 years ago
0
Support Java Script execution engine for web pages
#11
thammegowda
closed
7 years ago
4
Integrate Tika Parser to Parse Function for extracting Text and metadata
#10
thammegowda
closed
7 years ago
3
Integrate Nutch Fetcher Queue system for Fetch Function
#9
thammegowda
closed
6 years ago
0
Setup build system for plugins
#8
thammegowda
closed
8 years ago
0
Discuss about plugin architecture
#7
thammegowda
closed
8 years ago
1
Add Kafka Connector Data Sink
#6
thammegowda
closed
8 years ago
2
Escape metachars in solr queries
#5
thammegowda
closed
8 years ago
1
Maven packaging poblem
#4
karanjeets
closed
8 years ago
4
Create Wiki
#3
thammegowda
closed
8 years ago
1
Organize the code into proper modules
#2
thammegowda
closed
8 years ago
1
Add ALV2 license headers on code
#1
karanjeets
closed
8 years ago
1
Previous