curl / wcurl

a simple wrapper around curl to easily download files
https://curl.se/wcurl
Other
246 stars 9 forks source link

URL without filename fails #4

Open ryandesign opened 3 months ago

ryandesign commented 3 months ago

With wcurl 2024-07-02:

% wcurl https://github.com/Debian/     
curl: Remote file name has no length
curl: (23) Failed writing received data to disk/application

However with wget 1.24.5:

% wget https://github.com/Debian/
--2024-07-04 09:53:47--  https://github.com/Debian/
Resolving github.com (github.com)... 140.82.114.3
Connecting to github.com (github.com)|140.82.114.3|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘index.html’

index.html                                                             [  <=>                                                                                                                                                          ] 245.03K  1.17MB/s    in 0.2s    

2024-07-04 09:53:48 (1.17 MB/s) - ‘index.html’ saved [250913]
bagder commented 3 months ago

There's a plan to fix this in curl, although saving it with a different filename than what wget picks: https://github.com/curl/curl/pull/13988

ryandesign commented 3 months ago

This works for me but please test:

https://salsa.debian.org/debian/wcurl/-/merge_requests/4

Note there is a new dependency on trurl.

samueloph commented 3 months ago

I'll try to keep the discussion about this issue on salsa, but if anyone would like to reply and doesn't have an account, feel free to do it here.

BrianInglis commented 1 month ago

Works for me but saves file Debian not Debian.html. With just a host name, e.g. curl.se saves curl_response, not even curl-response.html, or better curl[-.]se.html or curl[-.]se[-.]index.html, which would be better than wget/2 anonymous index.html! Added similar comment to @Curl #13988 Just packaged wcurl as part of Cygwin distribution standard main package curl 8.10 so trying to get ahead of users trying it out! I describe wcurl and mention your home page in the announcement, so they could come here ;^> No other Cygwin packagers had any comments on whether I should include it in curl, make it a subpackage of curl source package, or package wcurl source and "binary" separately, so thought I would help out most users by giving out a free wcurl script and docs with every curl command line package. ;^>

BrianInglis commented 1 month ago

Could translate back from response content-type: header media-type/mime-type, for example:

$ curl -I curl.se
...
HTTP/2 200
server: nginx/1.21.1
content-type: text/html
...

to file type suffix extension using shared-mime-info data in /usr/share/mime/packages/freedesktop.org.xml which gives a list of glob patterns for each mime-type, for example:

$ awk '/<mime-type\stype="text\/html">/,/<\/mime-type>/' /usr/share/mime/packages/freedesktop.org.xml
  <mime-type type="text/html">
    <comment>HTML document</comment>
    <comment xml:lang="zh_TW">HTML 文件</comment>
    <comment xml:lang="zh_CN">HTML 文档</comment>
...
    <comment xml:lang="en_GB">HTML document</comment>
...
    <acronym>HTML</acronym>
    <expanded-acronym>HyperText Markup Language</expanded-acronym>
    <sub-class-of type="text/plain"/>
    <magic>
      <match type="string" value="&lt;!DOCTYPE HTML" offset="0:256"/>
...
    </magic>
    <magic priority="40">
      <match type="string" value="&lt;!--" offset="0"/>
      <match type="string" value="&lt;TITLE" offset="0:256"/>
      <match type="string" value="&lt;title" offset="0:256"/>
    </magic>
    <glob pattern="*.html" weight="80"/>
    <glob pattern="*.htm" weight="80"/>
  </mime-type>

The code could be something equivalent to this awk command:

$ awk '/<mime-type\s+type="[^"]+"[^>]*>/,/<\/mime-type>/ {
  if (!found) found = match( $0, "<mime-type type=\"" mime_type "\"");
  if (found && /<glob\s+pattern="/) {
    sub( /^\s*<glob\s+pattern="\*/, "");
    sub( /".*$/, "");
    print;
    exit; # exit on first match
  }
}' mime_type="text/html" /usr/share/mime/packages/freedesktop.org.xml
.html