Closed kashortiexda closed 10 months ago
I suspect you are not quoting the URL correctly, and it's your shell which 'splits' there (because &
has a special meaning to shells). wpull should not have any problems with it. Try wrapping the URL in quotes.
(Also, wpull 3.x is not this repo but https://github.com/ArchiveTeam/ludios_wpull.)
@JustAnotherArchivist Thanks vm. If I wrap the first url, do subsequent urls crawled also get wrapped ? (I doubt) Unfurtunately all subsequent urls also have the & I just checked my Terminal it is Unicode UTF-8, however I saw in the output in terminal after running wpull ..... 404 Not Found. Length: 283 [text/html; charset=iso-8859-1].
Wrap all subsequent urls in quotes, like you would any other program:
wpull "https://example.com?foo=bar&baz=whatevercomesnext" "https://example.com?baz=bar&foo=d"
trying to crawl a site with https://xyz/abc_&def/ghi_&jkl.html
I realise it is a badly formed site but I can't change that.. The url gets 'split' xyz/abc and then a second portion def/ghi jkl...
https://www.krugerpark.co.za/Kruger_National_Park_Lodging_&_Camping_Guide-Travel/Kruger_National_Park_Lodging_&_Camping_Guide.html
output in Terminal _Camping_Guide-Travel/Kruger_National_ParkLodging _Camping_Guide.html
Linux Fedora 38 Python version 3.7.16 Wpull version: 3.09