Bump org.jsoup:jsoup from 1.17.2 to 1.18.1

Bumps org.jsoup:jsoup from 1.17.2 to 1.18.1.

Release notes

Sourced from org.jsoup:jsoup's releases.

jsoup-1.18.1

https://jsoup.org/news/release-1.18.1

Improvements

Stream Parser: A StreamParser provides a progressive parse of its input. As each Element is completed, it is emitted via a Stream or Iterator interface. Elements returned will be complete with all their children, and an (empty) next sibling, if applicable. Elements (or their children) may be removed from the DOM during the parse, for e.g. to conserve memory, providing a mechanism to parse an input document that would otherwise be too large to fit into memory, yet still providing a DOM interface to the document and its elements. Additionally, the parser provides a selectFirst(String query) / selectNext(String query), which will run the parser until a hit is found, at which point the parse is suspended. It can be resumed via another select() call, or via the stream() or iterator() methods. 2096

Download Progress: added a Response Progress event interface, which reports progress and URLs are downloaded (and parsed). Supported on both a session and a single connection level. 2164, 656

Added Path accepting parse methods: Jsoup.parse(Path), Jsoup.parse(path, charsetName, baseUri, parser), etc. 2055

Updated the button tag configuration to include a space between multiple button elements in the Element.text() method. 2105

Added support for the ns|* all elements in namespace Selector. 1811

When normalising attribute names during serialization, invalid characters are now replaced with _, vs being stripped. This should make the process clearer, and generally prevent an invalid attribute name being coerced unexpectedly. 2143

Changes

Removed previously deprecated internal classes and methods. 2094

Build change: the built jar's OSGi manifest no longer imports itself. 2158

Bug Fixes

When tracking source positions, if the first node was a TextNode, its position was incorrectly set to -1. 2106

When connecting (or redirecting) to URLs with characters such as {, } in the path, a Malformed URL exception would be thrown (if in development), or the URL might otherwise not be escaped correctly (if in production). The URL encoding process has been improved to handle these characters correctly. 2142

When using W3CDom with a custom output Document, a Null Pointer Exception would be thrown. 2114

The :has() selector did not match correctly when using sibling combinators (like e.g.: h1:has(+h2)). 2137

The :empty selector incorrectly matched elements that started with a blank text node and were followed by non-empty nodes, due to an incorrect short-circuit. 2130

Element.cssSelector() would fail with "Did not find balanced marker" when building a selector for elements that had a ( or [ in their class names. And selectors with those characters escaped would not match as expected. 2146

Updated Entities.escape(string) to make the escaped text suitable for both text nodes and attributes (previously was only for text nodes). This does not impact the output of Element.html() which correctly applies a minimal escape depending on if the use will be for text data or in a quoted

... (truncated)

Changelog

Sourced from org.jsoup:jsoup's changelog.

1.18.1 (Pending)

Improvements

Stream Parser: A StreamParser provides a progressive parse of its input. As each Element is completed, it is emitted via a Stream or Iterator interface. Elements returned will be complete with all their children, and an (empty) next sibling, if applicable. Elements (or their children) may be removed from the DOM during the parse, for e.g. to conserve memory, providing a mechanism to parse an input document that would otherwise be too large to fit into memory, yet still providing a DOM interface to the document and its elements. Additionally, the parser provides a selectFirst(String query) / selectNext(String query), which will run the parser until a hit is found, at which point the parse is suspended. It can be resumed via another select() call, or via the stream() or iterator() methods. 2096

Download Progress: added a Response Progress event interface, which reports progress and URLs are downloaded (and parsed). Supported on both a session and a single connection level. 2164, 656

Added Path accepting parse methods: Jsoup.parse(Path), Jsoup.parse(path, charsetName, baseUri, parser), etc. 2055

Updated the button tag configuration to include a space between multiple button elements in the Element.text() method. 2105

Added support for the ns|* all elements in namespace Selector. 1811

When normalising attribute names during serialization, invalid characters are now replaced with _, vs being stripped. This should make the process clearer, and generally prevent an invalid attribute name being coerced unexpectedly. 2143

Changes

Removed previously deprecated internal classes and methods. 2094

Build change: the built jar's OSGi manifest no longer imports itself. 2158

Bug Fixes

When tracking source positions, if the first node was a TextNode, its position was incorrectly set to -1. 2106

When connecting (or redirecting) to URLs with characters such as {, } in the path, a Malformed URL exception would be thrown (if in development), or the URL might otherwise not be escaped correctly (if in production). The URL encoding process has been improved to handle these characters correctly. 2142

When using W3CDom with a custom output Document, a Null Pointer Exception would be thrown. 2114

The :has() selector did not match correctly when using sibling combinators (like e.g.: h1:has(+h2)). 2137

The :empty selector incorrectly matched elements that started with a blank text node and were followed by non-empty nodes, due to an incorrect short-circuit. 2130

Element.cssSelector() would fail with "Did not find balanced marker" when building a selector for elements that had a ( or [ in their class names. And selectors with those characters escaped would not match as expected. 2146

Updated Entities.escape(string) to make the escaped text suitable for both text nodes and attributes (previously was only for text nodes). This does not impact the output of Element.html() which correctly applies a minimal escape depending on if the use will be for text data or in a quoted attribute. 1278

... (truncated)

Commits

19e8539 [maven-release-plugin] prepare release jsoup-1.18.1
c8b6f2e Progress javadoc tweaks
6cbe7e4 Replace attribute invalid characters with _, vs stripping
68f6f9c Bump jetty.version from 9.4.54.v20240208 to 9.4.55.v20240627 (#2168)
6423e65 Relaxed the multi-thread w/o newRequest test
6c55f01 Bump org.codehaus.mojo:animal-sniffer-maven-plugin from 1.23 to 1.24 (#2167)
e1bfee9 Shh
b4b3fd1 Added test of partial fetch in Stream Parser
9ba6dc7 Make Entities.escape(string) suitable for both text and attributes
a0537c7 Handle escaped characters in consumeSubQuery
Additional commits viewable in compare view

You can trigger a rebase of this PR by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Note Automatic rebases have been disabled on this pull request as it has been open for over 30 days.

TeamNewPipe / NewPipeExtractor

Bump org.jsoup:jsoup from 1.17.2 to 1.18.1 #1189

jsoup-1.18.1

Improvements

Changes

Bug Fixes

1.18.1 (Pending)

Improvements

Changes

Bug Fixes