issues
search
bejean
/
crawl-anywhere
Crawl-Anywhere - Web Crawler and document processing pipeline with Solr integration.
www.crawl-anywhere.com
Apache License 2.0
96
stars
38
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Files not found
#93
kemoc
opened
6 years ago
1
Access forbidden, required password
#92
kemoc
closed
6 years ago
0
88: Add per-host config for bypassing robots file
#91
grimsa
opened
7 years ago
0
86: Add cookie support to HttpLoader
#90
grimsa
opened
7 years ago
0
Parse not correct for French and Chinese.
#89
quinnsoft88
opened
8 years ago
0
Add ability to bypass robots.txt on a per-host basis
#88
grimsa
opened
9 years ago
0
issue with require_once_all
#87
jagdeep786
opened
9 years ago
1
HttpLoader does not fully support cookies
#86
grimsa
opened
9 years ago
1
Solr is not updated via indexer
#85
pixel-paul
closed
8 years ago
8
Unable to add Source
#84
FireLizard
closed
9 years ago
1
Title not parsed correctly for some international sites.
#83
aravinuthala
opened
9 years ago
0
Source Export / Import
#82
bejean
opened
9 years ago
0
Crawl-anywhere on mac
#81
jayasreemca
closed
8 years ago
1
Unable to add source
#80
sawan12
closed
9 years ago
4
tools_test_scripts.sh never get to see any output of found links
#79
bejean
opened
9 years ago
6
Review IP geolocalisation
#78
bejean
opened
9 years ago
1
item_contentsize for PDF
#77
bejean
closed
9 years ago
2
Proxy address exclusion list
#76
bejean
closed
9 years ago
1
Search by tag or collection in search interface doesn't work
#75
bejean
closed
9 years ago
0
facet.mode_union parameter in search interface is ignored
#74
bejean
closed
9 years ago
0
If several accounts exist, the default one is ignored
#73
bejean
closed
9 years ago
0
Missing dependency
#72
bejean
closed
9 years ago
1
Check for deletion period
#71
bejean
opened
10 years ago
0
Fixes for "Get page only" mode in filtering rules and backslash escaping
#70
OkkeKlein
closed
9 years ago
1
Wrong character encoding
#69
torhar
opened
10 years ago
1
Recrawl period binaries feature
#68
bejean
closed
9 years ago
1
Allow to write a web connector js for all web site
#67
bejean
opened
10 years ago
0
Integrate tools like spiderling
#66
bejean
opened
10 years ago
0
Bad crawl failure handeling while crawler stop due to redirection to other domain on first page
#65
bejean
closed
9 years ago
1
Post processing API
#64
giorgio79
opened
10 years ago
0
Crawl speed limit and obeying crawl-delay?
#63
giorgio79
opened
10 years ago
0
Only index in Solr pages that have keyword?
#62
giorgio79
closed
10 years ago
1
Crawler do not pause when queue size limit is reached
#61
bejean
closed
10 years ago
0
Check "Less no of pages are crawled"
#60
bejean
opened
10 years ago
0
Check "Sitemap.xml"
#59
bejean
opened
10 years ago
0
Check "Wild card mappings for metadata behave strange/wrong"
#58
bejean
opened
10 years ago
0
Test with MongoDB 2.6.x
#57
bejean
closed
10 years ago
1
Test url rules test button in UI admin
#56
bejean
opened
10 years ago
0
Crawling status reporting in admin UI
#55
bejean
opened
10 years ago
0
Collection and Tag administration
#54
bejean
closed
9 years ago
1
Search by collections and tags
#53
bejean
closed
9 years ago
0
How are script engines evaluated?
#52
edrush
closed
10 years ago
1
Wild card mappings for metadata not working
#51
edrush
closed
10 years ago
1
Support for multiple solr cores
#50
edrush
closed
10 years ago
1
Sitemap XML
#49
pixel-paul
closed
10 years ago
2
'Crawl Now' doesnt seem to work
#48
pixel-paul
closed
10 years ago
8
Crawler status no correct in Web Admin UI
#47
bejean
closed
10 years ago
0
Title not parsed correctly for some international sites.
#46
aravinuthala
closed
9 years ago
3
Last installation issues
#45
bejean
closed
9 years ago
0
proxy params are ignored
#44
torhar
closed
9 years ago
1
Next