the URL will be interpreted as http://localhost:8000/%E2%80%BEtildepath/, which would then result in a 404.
Curiously, if I set wpull to start in http://localhost:8000/~tildepath/, that page and and subsequent URLs are found properly so long as those URLs do not themselves have tildes
This turned up in ArchiveBot job 62thvsbqv0tn0af8fhhjklya3 as well as, judging from past logs, 5bi1u8ffbrtwnf2jb6d3prqwj, 6gjq81kbvhhcjvf6v5z4ysv4i and 2bkvkya714zxqkity2cmw1w10
This happens for sure when the html-parameter is set to libxml2-lxml in 2.0.1, but not 1.2.3. In addition, I found a similar issue mentioned with wget discussed here and here
For example, an html file with a link set up like this
the URL will be interpreted as
http://localhost:8000/%E2%80%BEtildepath/
, which would then result in a 404.Curiously, if I set wpull to start in
http://localhost:8000/~tildepath/
, that page and and subsequent URLs are found properly so long as those URLs do not themselves have tildesThis turned up in ArchiveBot job 62thvsbqv0tn0af8fhhjklya3 as well as, judging from past logs, 5bi1u8ffbrtwnf2jb6d3prqwj, 6gjq81kbvhhcjvf6v5z4ysv4i and 2bkvkya714zxqkity2cmw1w10
This happens for sure when the
html-parameter
is set tolibxml2-lxml
in 2.0.1, but not 1.2.3. In addition, I found a similar issue mentioned with wget discussed here and here