- As part of NUTCH-2805, the plugin urlfilter-domainblacklist has been renamed to urlfilter-domaindenylist. And the fields required for the plugin urlfilter.domainblacklist.rules and urlfilter.domainblacklist.file has been replaced with urlfilter.domaindenylist.rules and urlfilter.domaindenylist.file respectively. See NUTCH-2802 for more details.
Sub-task
[NUTCH-2671] - Upgrade ant ivy library
[NUTCH-2672] - Ant build erronously installs *-test.jar instead *.jar for target "nightly"
[NUTCH-2805] - Rename plugin urlfilter-domainblacklist
[NUTCH-2809] - Upgrade any23 plugin dependency to 2.4
[NUTCH-2816] - Add Spotbugs target to ant build
[NUTCH-2817] - Avoid check for equality of URL path and file part using ==/!=
[NUTCH-2829] - Fix ant target "clean-cache"
Bug
[NUTCH-2669] - Reliable solution for javax.ws packaging.type
[NUTCH-2697] - Upgrade Ivy to fix the issue of an unset packaging.type property
[NUTCH-2801] - RobotsRulesParser command-line checker to use http.robots.agents as fall-back
[NUTCH-2810] - FreeGenerator to actually apply configured number of fetch lists
[NUTCH-2813] - MoreIndexingFilter - can't parse erroneous date - 2019-07-03T10:28:14
[NUTCH-2814] - HttpDateFormat's internal time zone may change after parsing a date
[NUTCH-2818] - Ant build: upgrade Apache Rat report task
[NUTCH-2823] - IllegalStateException in IndexWriters.describe() when validating url param for SolrIndexer
[NUTCH-2824] - urlnormalizer-basic to unescape percent-encoded host names
Improvement
[NUTCH-1190] - MoreIndexingFilter refactor: move data formats used to parse "lastModified" to a config file.
[NUTCH-2582] - Set pool size of XML SAX parsers used for MIME detection in Tika 1.19
[NUTCH-2730] - SitemapProcessor to treat sitemap URLs as Set instead of List
[NUTCH-2782] - protocol-http / lib-http: support TLSv1.3
[NUTCH-2796] - Upgrade to crawler-commons 1.1
[NUTCH-2799] - Add .asf.yaml file
[NUTCH-2833] - Upgrade to Tika 1.25
[NUTCH-2835] - Upgrade commons-jexl from 2 --> 3
[NUTCH-2836] - Upgrade various commons dependencies
[NUTCH-2837] - Update multiple dependencies
[NUTCH-2841] - Upgrade xercesImpl dependency
Wish
[NUTCH-2834] - Deduplication mode via command line in crawl script
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
- `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language
- `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language
- `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language
- `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/dkd/nutch-typo3-cms/network/alerts).
Bumps nutch from 1.8 to 1.18.
Changelog
Sourced from nutch's changelog.
... (truncated)
Commits
43f3550
Prepare for Nutch 1.18 releasee9f125c
Prepare for Nutch 1.18 release59c63c7
NUTCH-2841 Upgrade xercesImpl dependency (#563)7f0fdb1
NUTCH-2837 Update multiple dependencies (#560)fbd53ba
NUTCH-2836 Upgrade various commons dependencies (#559)88a17f2
Add possibility to setup deduplication group mode in crawl script (#557)8d8e08b
NUTCH-2835 Upgrade commons-jexl from 2 --> 3 (#558)4c7d422
Merge pull request #556 from sebastian-nagel/tika-1.2540218a0
NUTCH-2833 Upgrade to Tika 1.25c1cf6bb
Merge pull request #554 from sebastian-nagel/NUTCH-2582-set-mime-types-reader...Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase
.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/dkd/nutch-typo3-cms/network/alerts).