issues
search
USCDataScience
/
sparkler
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
http://irds.usc.edu/sparkler/
Apache License 2.0
411
stars
143
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Sparkler Solr Schema Update
#69
karanjeets
closed
7 years ago
2
Unittests : With embedded solr and embedded webserver for crawling
#68
thammegowda
closed
7 years ago
3
Update Solr Schema
#67
karanjeets
closed
7 years ago
0
Solr Cloud - solrj.SolrServerException: No live SolrServers available to handle this request
#66
thammegowda
closed
7 years ago
1
e.u.i.s.model.Resource.<init>(Resource.java:46) java.net.MalformedURLException: Stream handler unavailable due to: For input string: "0x6"
#65
thammegowda
closed
7 years ago
0
Error Message "Could not launch browser" when starting crawler of quickstart guide
#64
rgeissen
closed
7 years ago
5
crawl hanging..
#63
MyraBaba
closed
7 years ago
1
Develop
#62
arelaxend
closed
7 years ago
6
Fetcher implementations are made fail safe by catching exceptions
#61
thammegowda
closed
7 years ago
2
Addjar
#60
buggtb
closed
7 years ago
9
Solr config
#59
buggtb
closed
7 years ago
1
pass crawldb uri on command line
#58
buggtb
closed
7 years ago
0
Java null pointer error in fetch()
#57
arelaxend
closed
7 years ago
9
Working with remote spark.
#56
buggtb
closed
7 years ago
11
URL filter regex
#55
MyraBaba
closed
7 years ago
3
Exclude org.json:json library since its license is incompatible with Apache License 2.0
#54
buggtb
closed
6 years ago
2
Is the default config overrideable without updating the jar?
#53
buggtb
closed
7 years ago
2
Added Docker File for quick testing in localmode
#52
thammegowda
closed
7 years ago
3
No fetched content is written
#51
karanjeets
closed
7 years ago
0
Finish Juju charm
#50
buggtb
opened
7 years ago
3
[MEMEX] Change Enum config to have unfetched in place of new
#49
sujen1412
closed
7 years ago
3
[MEMEX] Add extractor plugin interface
#48
thammegowda
opened
7 years ago
0
[NUTCH][MEMEX] Create Generator Plugin Interface
#47
thammegowda
opened
7 years ago
0
[MEMEX] Feature to add cookies from configured files to the request header sent by fetcher
#46
thammegowda
closed
3 years ago
4
[NUTCH][MEMEX] Port robots.txt rules from Nutch
#45
thammegowda
opened
7 years ago
0
[NUTCH][MEMEX] Add Rotating User Agents feature to fetcher
#44
thammegowda
closed
7 years ago
0
[NUTCH][MEMEX] port the Generator Module (aka scoring plugin )
#43
thammegowda
opened
7 years ago
1
Sparkler Build Failing
#42
karanjeets
closed
7 years ago
1
Default Dashboard config for Crawldb Schema
#41
thammegowda
closed
7 years ago
1
Dockerize Sparkler
#40
karanjeets
closed
7 years ago
3
Started Sparkler-UI & added Banana as a Git submodule
#39
karanjeets
closed
7 years ago
3
Banana Dashboard
#38
manishdwibedy
closed
7 years ago
9
Plugin for fetching pages using a headless browser
#37
smadha
closed
7 years ago
9
Update README.md
#36
karanjeets
closed
7 years ago
0
[SPARKLER 33] Disable kafka by default with kafka.enabled parameter
#35
rahulpalamuttam
closed
7 years ago
0
Minor Issues
#34
karanjeets
closed
7 years ago
1
Job hangs for a minute when Kafka is not configured
#33
thammegowda
closed
7 years ago
6
Setup unit tests and integration tests for Sparkler
#32
thammegowda
closed
3 years ago
0
log4j
#31
KisungPark
closed
7 years ago
1
[Sparkler 10] Extract text/title and push to crawldb
#30
karanjeets
closed
7 years ago
7
[SPARKLER 16] Output status code to log.debug
#29
rahulpalamuttam
closed
8 years ago
1
[SPARKLER 6] Kafka Connector Data Sink
#28
rahulpalamuttam
closed
8 years ago
3
[SPARKLER] use case statements to handle option types for the outLink filter
#27
rahulpalamuttam
closed
8 years ago
1
Content/text/url is not indexing in solr
#26
MuhammadTalhaAfzal
closed
7 years ago
5
Guide for sparkler and hdfs
#25
MuhammadTalhaAfzal
closed
7 years ago
5
Added YAML-based Sparkler Configuration
#24
karanjeets
closed
8 years ago
6
Visual Analytics : Create admin dashboard to crawldb to visualize the stats in realtime
#23
thammegowda
closed
7 years ago
9
Sparkler 19
#22
karanjeets
closed
8 years ago
1
new additions : plugin interfaces, plugin service, urlfilter, regex url filter, config files....
#21
thammegowda
closed
8 years ago
3
Define interfaces for the plugins
#20
thammegowda
closed
3 years ago
4
Previous
Next