benibela / xidel

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
http://www.videlibri.de/xidel.html
GNU General Public License v3.0
681 stars 42 forks source link

Xidel Does Not Honor HTML <base> Tag when Following or Resolving Links #17

Closed thedward closed 7 years ago

thedward commented 7 years ago
$ xidel --version
Xidel 0.9.6
(20161120.5245.ead1b6fb3d7b)

If I try something like

xidel 'http://example.com/example01.html' \
  -e '/html/head/title' \
  -f '//a[@id eq "next_page"]'

and the document contains a <base href="..."> tag, then xidel appears to ignores the provided base URL and thusly fails to resolve the correct link — it instead it resolves the link using the current URL as the base.

It would be neat if this could be fixed, but in the meantime there is a simple workaround.

xidel 'http://example.com/example01.html' \
  -e '/html/head/title' \
  -f 'fn:resolve-uri(//a[@id eq "next_page"]/@href,/html/head/base/@href)'
benibela commented 7 years ago

fixed https://github.com/benibela/internettools/commit/beb66b2a97d3ab59b5442b7b8d3645c9bde87d41