Open ryandesign opened 3 months ago
There's a plan to fix this in curl, although saving it with a different filename than what wget picks: https://github.com/curl/curl/pull/13988
This works for me but please test:
https://salsa.debian.org/debian/wcurl/-/merge_requests/4
Note there is a new dependency on trurl.
I'll try to keep the discussion about this issue on salsa, but if anyone would like to reply and doesn't have an account, feel free to do it here.
Works for me but saves file Debian
not Debian.html
.
With just a host name, e.g. curl.se
saves curl_response
, not even curl-response.html
, or better curl[-.]se.html
or curl[-.]se[-.]index.html
, which would be better than wget
/2 anonymous index.html
! Added similar comment to @Curl #13988
Just packaged wcurl
as part of Cygwin distribution standard main package curl
8.10 so trying to get ahead of users trying it out!
I describe wcurl
and mention your home page in the announcement, so they could come here ;^>
No other Cygwin packagers had any comments on whether I should include it in curl
, make it a subpackage of curl
source package, or package wcurl
source and "binary" separately, so thought I would help out most users by giving out a free wcurl
script and docs with every curl
command line package. ;^>
Could translate back from response content-type:
header media-type/mime-type, for example:
$ curl -I curl.se
...
HTTP/2 200
server: nginx/1.21.1
content-type: text/html
...
to file type suffix extension using shared-mime-info
data in /usr/share/mime/packages/freedesktop.org.xml
which gives a list of glob patterns for each mime-type, for example:
$ awk '/<mime-type\stype="text\/html">/,/<\/mime-type>/' /usr/share/mime/packages/freedesktop.org.xml
<mime-type type="text/html">
<comment>HTML document</comment>
<comment xml:lang="zh_TW">HTML 文件</comment>
<comment xml:lang="zh_CN">HTML 文档</comment>
...
<comment xml:lang="en_GB">HTML document</comment>
...
<acronym>HTML</acronym>
<expanded-acronym>HyperText Markup Language</expanded-acronym>
<sub-class-of type="text/plain"/>
<magic>
<match type="string" value="<!DOCTYPE HTML" offset="0:256"/>
...
</magic>
<magic priority="40">
<match type="string" value="<!--" offset="0"/>
<match type="string" value="<TITLE" offset="0:256"/>
<match type="string" value="<title" offset="0:256"/>
</magic>
<glob pattern="*.html" weight="80"/>
<glob pattern="*.htm" weight="80"/>
</mime-type>
The code could be something equivalent to this awk
command:
$ awk '/<mime-type\s+type="[^"]+"[^>]*>/,/<\/mime-type>/ {
if (!found) found = match( $0, "<mime-type type=\"" mime_type "\"");
if (found && /<glob\s+pattern="/) {
sub( /^\s*<glob\s+pattern="\*/, "");
sub( /".*$/, "");
print;
exit; # exit on first match
}
}' mime_type="text/html" /usr/share/mime/packages/freedesktop.org.xml
.html
With wcurl 2024-07-02:
However with wget 1.24.5: