issues
search
elastic
/
crawler
Other
125
stars
10
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Scheduled purge of "hasn't been crawled in N days"
#168
seanstory
opened
3 days ago
0
join_as option that only grabs the first match
#167
seanstory
opened
5 days ago
0
Bump webrick, move to test group
#166
seanstory
closed
6 days ago
0
Bump nokogiri, tika, remove explicit bouncycastle
#165
seanstory
closed
6 days ago
1
Update elasticsearch.yml.example
#164
navarone-feekery
closed
2 weeks ago
1
Update README.md
#163
navarone-feekery
closed
2 weeks ago
0
Support custom HTTP headers
#162
vidok
opened
3 weeks ago
0
Support custom HTTP headers
#161
vidok
opened
3 weeks ago
4
Test and document extra legacy config options
#160
navarone-feekery
opened
3 weeks ago
0
Elasticsearch config file takes precedence over crawler config
#159
navarone-feekery
opened
3 weeks ago
0
[Docker] file permission error when _bulk request fails
#158
seanstory
opened
3 weeks ago
1
Add support for crawling dynamic content
#157
navarone-feekery
opened
3 weeks ago
0
Upgrade rexml to 3.3.9
#156
lhearachel
closed
2 weeks ago
0
Replace nokogiri with jsoup
#155
navarone-feekery
opened
4 weeks ago
0
Crawler fails when providing ES config in flat format.
#154
vidok
opened
4 weeks ago
0
document volume mounting and docker-compose options
#153
seanstory
opened
1 month ago
1
Crawler attributes like `data-elastic-include` are ignored
#152
sarahg
closed
1 month ago
3
Expand extraction rules to deny content based on selectors
#151
navarone-feekery
opened
1 month ago
0
Pin rexml version to 3.3.8
#150
navarone-feekery
closed
1 month ago
0
Add a configuration option to disable SSL for ES connections
#149
navarone-feekery
opened
1 month ago
4
[0.2] Use crawl for the first step vs schedule (#147)
#148
github-actions[bot]
closed
1 month ago
0
Use crawl for the first step vs schedule
#147
dadoonet
closed
1 month ago
2
Rename the ingestion-team
#146
tutelaris
closed
2 months ago
0
Revert "Revert "Update ent-search-eng team to be a search-eng team""
#145
seanstory
closed
2 months ago
0
HTML Content Extraction
#144
DasUberLeo
opened
2 months ago
4
Revert "Update ent-search-eng team to be a search-eng team"
#143
seanstory
closed
2 months ago
1
Update ent-search-eng team to be a search-eng team
#142
tutelaris
closed
2 months ago
2
Bump product version to `0.2.1`
#141
navarone-feekery
closed
2 months ago
0
[0.2] Fix usage of in-built `File` lib (#139)
#140
navarone-feekery
closed
2 months ago
0
Fix usage of in-built `File` lib
#139
navarone-feekery
closed
2 months ago
1
Output sink type `file` is broken
#138
navarone-feekery
closed
2 months ago
1
[0.2] Add RELEASING.md (#133)
#137
github-actions[bot]
closed
2 months ago
0
[0.2] Fix crawl result logs (#134)
#136
github-actions[bot]
closed
2 months ago
0
[0.2] Add docs for running official docker image (#132)
#135
github-actions[bot]
closed
2 months ago
0
Fix crawl result logs
#134
navarone-feekery
closed
2 months ago
1
Add RELEASING.md
#133
navarone-feekery
closed
2 months ago
1
Add docs for running official docker image
#132
navarone-feekery
closed
2 months ago
1
Crawl ID remains the same across scheduled crawls
#131
navarone-feekery
opened
2 months ago
1
Extraction rule fields applied to unrelated docs
#130
navarone-feekery
closed
2 months ago
1
Content that is larger than `elasticsearch.bulk_api.max_size_bytes` is not ingested
#129
navarone-feekery
closed
2 months ago
4
Crawl result erroneously logs a failure if there were no docs to purge
#128
navarone-feekery
closed
2 months ago
0
[0.2] Add feature comparison table (#117)
#127
github-actions[bot]
closed
2 months ago
0
[0.2] Add CRAWLER_DIRECTIVES.md and purge crawls documentation (#115)
#126
github-actions[bot]
closed
2 months ago
0
[0.2] Add CHANGELOG.md and upgrade to beta (#121)
#125
navarone-feekery
closed
2 months ago
0
Flaky spec for bulk queue thread-locking
#124
navarone-feekery
opened
2 months ago
0
Update `.backportrc.json`
#123
navarone-feekery
closed
2 months ago
0
Bump version to 0.3.0
#122
navarone-feekery
closed
2 months ago
0
Add CHANGELOG.md and upgrade to beta
#121
navarone-feekery
closed
2 months ago
2
[0.1] Add docker publishing scripts and pipeline (#103)
#120
github-actions[bot]
closed
2 months ago
0
[0.1] Misc fixes to the Wolfi-based Dockerfile (#114)
#119
github-actions[bot]
closed
2 months ago
0
Next