ArchiveTeam / wpull

Wget-compatible web downloader and crawler.
GNU General Public License v3.0
545 stars 77 forks source link

LinkInfo/LinkContext's linked and inline fields for HTML-extracted URLs are not always bools #458

Open JustAnotherArchivist opened 3 years ago

JustAnotherArchivist commented 3 years ago

Per the documentation of both wpull.scraper.html.LinkInfo and wpull.scraper.base.LinkContext, the linked and inline fields are supposed to be bools. However, that is not always the case: the ElementWalker's is_link_inline and is_html_link routines may return 1 and 2, respectively, instead of True due to the bit flag shenanigans in those methods. Fortunately, this has no effect in wpull because non-zero integers are truthy. But it's still a bug in the API.