issues
search
USCDataScience
/
sparkler
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
http://irds.usc.edu/sparkler/
Apache License 2.0
410
stars
143
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Push to docker repos on each travis build
#169
buggtb
closed
3 years ago
1
Create Helm Chart for Sparkler
#168
buggtb
closed
3 years ago
0
Create Kubernetes deployment files for Sparkler and Solr
#167
buggtb
closed
4 years ago
0
Standalone Docker image
#166
buggtb
closed
3 years ago
2
Crawler success but data is not populated into dashboard and output file
#165
kavitasharma21
closed
3 years ago
1
Parser Extension Points and Interfaces
#164
micheladennis
closed
3 years ago
1
Removed chmod command from Dockerfile and added echo method in Readme
#163
prenastro
closed
6 years ago
1
Updated Dockerfile to include Vim and Nano editors inside Sparkler Container
#162
prenastro
closed
6 years ago
5
Added instructions to run Sparkler crawl with a seed url file
#161
prenastro
closed
6 years ago
1
Unable to run jBrowser plugin
#160
micheladennis
closed
6 years ago
4
Removing sparkler-app-0.2.0-SNAPSHOT.jar
#159
misterpilou
closed
6 years ago
8
RemoteSolrException: Error from server at unknown field 'segment'
#158
micheladennis
closed
6 years ago
1
Failed to construct kafka producer
#157
misterpilou
closed
6 years ago
5
Sparkler/PF4J plugins on a Spark cluster
#156
baddlan
closed
6 years ago
2
Sparkler wiki - links and downloads
#155
srinidhinandakumar
closed
6 years ago
3
Fix for #116 - Store segment path of url content in SOLR
#154
sujen1412
closed
6 years ago
1
Added a tool to dump sequence file records as raw files
#153
thammegowda
closed
6 years ago
1
Support for infinite crawl or until the end of all new URLs
#152
thammegowda
closed
6 years ago
0
Get pages Source code
#151
remibacha
closed
6 years ago
0
adds response_time to solr
#150
Rohithyeravothula
closed
6 years ago
1
Read large files using Tika
#149
voltek62
closed
6 years ago
1
Sparkler 147 : Support for Chaining multiple similar plugins, eg: URLFilter
#148
thammegowda
closed
6 years ago
1
Feature : Chain multiple plugins belonging to the same extension point
#147
thammegowda
closed
6 years ago
0
Displaying term panel in dashboard
#146
misterpilou
closed
6 years ago
1
Term's panel in dashboard is not displayed
#145
misterpilou
closed
6 years ago
0
Time taken by HTTP servers to provide a web page
#144
voltek62
closed
6 years ago
0
maven-assembly-plugin:2.5.3:single failed: group id '<>' is too big E…
#143
supermonk
closed
6 years ago
2
Sparkler 127 : Simple plugin framework
#142
thammegowda
closed
6 years ago
0
banana dashboard does not load the injected jobs
#141
YehualashetGit
closed
6 years ago
12
CLI argument to focus the crawl to a specific domain
#140
voltek62
closed
3 years ago
3
Support for flexible focus language crawling framework
#139
thammegowda
opened
6 years ago
3
Performed code refactoring
#138
giuseppetotaro
closed
6 years ago
3
Revert "SPARKLER-105: Define a score plugin interface (#105)"
#137
thammegowda
closed
6 years ago
0
Upgrade to solr 7.1.0 and migrate schema
#136
thammegowda
closed
6 years ago
2
Upgrade to Solr 7.1.0 -- Critical Security Fix in solr
#135
thammegowda
closed
6 years ago
0
Build Failure on JDK 9
#134
thammegowda
closed
3 years ago
1
SPARKLER-105: Define a score plugin interface (#105)
#133
giuseppetotaro
closed
6 years ago
5
disable redundant distribution types
#132
thammegowda
closed
6 years ago
0
Fix #130: Use Maven overlay for banana instead of git submodule
#131
sujen1412
closed
6 years ago
1
Use Maven overlay for banana instead of git submodule
#130
sujen1412
closed
6 years ago
0
Fix for #128: Make SOLR query for generator configurable through yaml
#129
sujen1412
closed
6 years ago
1
[Nutch][Memex] Make SOLR query for generator configurable through yaml
#128
sujen1412
closed
6 years ago
0
Replace OSGI and Felix with pf4j
#127
thammegowda
closed
6 years ago
2
fix docker - permission issues and smaller images
#126
thammegowda
closed
6 years ago
3
Dashboard setup in non-docker environment
#125
C0mmander198
closed
6 years ago
3
Error when injecting urls
#124
User12300
closed
6 years ago
15
Errors and Exceptions when crawling - html.unit Time out exception
#123
amensiko
closed
6 years ago
3
upgraded solr to version 6.6.0 in docker deployment
#122
ldaume
closed
7 years ago
1
Maven build: overlapping classes found by shade plugin
#121
thammegowda
closed
3 years ago
0
Create a sparkler "distribution"
#120
buggtb
closed
7 years ago
2
Previous
Next