SpiritQuaddicted / sourceforge-file-download

Allows you to download all of a sourceforge project's files. Downloads to the current directory into a directory named like the project. Pass the project's name as first argument, eg `./sourceforge-file-download.sh inkscape` to download all of http://sourceforge.net/projects/inkscape/files/
72 stars 24 forks source link

All even lines in urllist are filled with junk, causing wget to abort. #5

Closed wertercatt closed 2 years ago

wertercatt commented 2 years ago
FINISHED --2022-07-06 16:10:35--
Total wall clock time: 4m 0s
Downloaded: 133 files, 22M in 1m 18s (288 KB/s)
--2022-07-06 16:10:35--  https://downloads.sourceforge.net/project/gameextractor/Game%20Extractor%203.0x/3.13/extract_313.exe
Loaded CA certificate '/etc/ssl/certs/ca-certificates.crt'
Resolving downloads.sourceforge.net (downloads.sourceforge.net)... 204.68.111.105
Connecting to downloads.sourceforge.net (downloads.sourceforge.net)|204.68.111.105|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://versaweb.dl.sourceforge.net/project/gameextractor/Game%20Extractor%203.0x/3.13/extract_313.exe [following]
--2022-07-06 16:10:35--  https://versaweb.dl.sourceforge.net/project/gameextractor/Game%20Extractor%203.0x/3.13/extract_313.exe
Resolving versaweb.dl.sourceforge.net (versaweb.dl.sourceforge.net)... 162.251.232.173
Connecting to versaweb.dl.sourceforge.net (versaweb.dl.sourceforge.net)|162.251.232.173|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9547666 (9.1M) [application/octet-stream]
Saving to: ‘gameextractor/Game Extractor 3.0x/3.13/extract_313.exe’

gameextractor/Game Extractor  100%[=================================================>]   9.10M   216KB/s    in 43s

2022-07-06 16:11:19 (216 KB/s) - ‘gameextractor/Game Extractor 3.0x/3.13/extract_313.exe’ saved [9547666/9547666]

--2022-07-06 16:11:19--  http://[https://sourceforge.net/projects/gameextractor/files/%22%3E]/
Resolving https://sourceforge.net/projects/gameextractor/files/"> (https://sourceforge.net/projects/gameextractor/files/">)... failed: Name or service not known.
wget: unable to resolve host address ‘https://sourceforge.net/projects/gameextractor/files/">’

urllist can be seen here to verify the buggy output: https://gist.github.com/wertercatt/29e524fd16a1fca39cf8aa1dfd2e1431

Command used was just ./sourceforge-file-downloader.sh gameextractor

Version information follows:

[wertercatt@wertserv ~]$ grep --version
grep (GNU grep) 3.7
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and others; see
<https://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
[wertercatt@wertserv ~]$ wget --version
GNU Wget 1.21.3 built on linux-gnu.

-cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls
+ntlm +opie +psl +ssl/gnutls

Wgetrc:
    /etc/wgetrc (system)
Locale:
    /usr/share/locale
Compile:
    gcc -DHAVE_CONFIG_H -DSYSTEM_WGETRC="/etc/wgetrc"
    -DLOCALEDIR="/usr/share/locale" -I. -I../lib -I../lib
    -I/usr/include/p11-kit-1 -DHAVE_LIBGNUTLS -DNDEBUG -march=x86-64
    -mtune=generic -O2 -pipe -fno-plt -fexceptions
    -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security
    -fstack-clash-protection -fcf-protection -flto=auto
Link:
    gcc -I/usr/include/p11-kit-1 -DHAVE_LIBGNUTLS -DNDEBUG
    -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions
    -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security
    -fstack-clash-protection -fcf-protection -flto=auto
    -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -flto=auto
    -lpcre2-8 -luuid -lidn2 -lnettle -lgnutls -lz -lpsl ../lib/libgnu.a
    /usr/lib/libunistring.so

Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://www.gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Originally written by Hrvoje Niksic <hniksic@xemacs.org>.
Please send bug reports and questions to <bug-wget@gnu.org>.
[wertercatt@wertserv ~]$ bash --version
GNU bash, version 5.1.16(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
SpiritQuaddicted commented 2 years ago

Thanks! Looks like they changed something in their HTML. I believe the encoded URLs are redundant (same as the clear ones) and can be excluded by the grep. Should be fixed it in https://github.com/SpiritQuaddicted/sourceforge-file-download/commit/5da4dd783ec1af639924cc633886ebb64fa2f3e8

Thank you for reporting this! I hope the script works and helps you achieve great archivals.