Closed MasterInQuestion closed 3 months ago
curl -I -X GET https://example.com
Didn't manage to locate the "-X"... Note "GET" is case-sensitive: "-X" just passes the string verbatim.
Why does it for this example seem to still download the whole file..? https://drive.usercontent.google.com/download?confirm=t&export=download&id=1WxOrSi-GNB45nLUUiR4PT7c4H2VurtKk (~ 18.34 MiB)
"--max-filesize 1" variant worked intended.
See also: https://trac.ffmpeg.org/ticket/11056#comment:16 https://trac.ffmpeg.org/ticket/11159#comment:3 ("confirm=t" needed to bypass some "virus" confirmation)
----
More suitable to test: -A "Mozilla/5.0 (Linux; rv:999) Gecko/20100101 Firefox/999" "https://premium.britannica.com/wp-content/uploads/2023/05/memorialday-2620x1080-1.png" (~ 1.3 MiB)
@bagder, probably worth your attention.
What is? I don't understand what you're talking about.
Pardon.
Straightforward but less accurate:
curl -I -X GET -A "Mozilla/5.0 (Linux; rv:999) Gecko/20100101 Firefox/999" "https://premium.britannica.com/wp-content/uploads/2023/05/memorialday-2620x1080-1.png"
That's a curl command line. What about it?
You ask for -I
(HEAD) and get you insist on -X GET
which is highly confusing. What do you want it to do?
The question is: Comparing the "--max-filesize 1" variant, this one causes the unwanted full-download. (instead of mere getting the header)
----
Rationale explained in 1st post: "Certain servers may refuse to serve HEAD (example reported HTTP 403 Forbidden), meanwhile the file may be large."
OK, so what is the exact question?
How to: Effectively doing HEAD but with GET, without full-download?
That is exactly what you get with:
curl -I -X GET $URL
That is exactly what you get me with...
[ Quote bagder @ CE 2024-08-07 15:05:06 UTC:
https://github.com/curl/curl/issues/14440#issuecomment-2273695109
3. You can do a GET without reading the body with `curl -I -X GET "https://example.com"
`. ]
The problem is: It seems to cause the unwanted full-download.
Did it work (without full-download) for you?
I also don't understand what you are asking. You want curl to behave as if it's receiving a HEAD response and close? What do you mean it causes an unwanted download? For example this download of 200MB should terminate immediately (after receiving the headers) if you tell curl it's a HEAD request but then change it custom to GET:
curld -v -I -X GET http://cachefly.cachefly.net/200mb.test -o NUL
The server sees GET and replies with the content but curl will terminate the connection after the headers.
It sounds to me like you want to simulate a HEAD reply for a server that does not support those requests but if you send a GET request to the server then it may send data before curl can close the connection. That's what you are asking the server to do with GET you want to get the resource. Correct me if I'm wrong @bagder but I'm pretty sure it's discarded as excess in such a case (ie not written to -o outfile) though I don't know if that's guaranteed
Compare:
curl -I -X GET "https://cachefly.cachefly.net/200mb.test"
curl -I -X GET "https://drive.usercontent.google.com/download?confirm=t&export=download&id=1WxOrSi-GNB45nLUUiR4PT7c4H2VurtKk"
#1 also worked for me. (no notable download)
As I have explained the server may send data before curl can close the connection. I took a look at your latter example in Wireshark and google takes approximately 3 seconds to reply with HTTP/2 HEADERS, I don't know why so long but it has nothing to do with curl. Then the server follows with DATA frames and during that entire time which is less than 1 second like 100 200 ms curl replies with RST_STREAM on the stream and then GOAWAY on the connection. You cannot expect no data will be sent because you are requesting the data is sent and curl needs to hang up after receiving the headers.
[[ As I have explained, the server may send data before `curl` can close the connection. I took a look at your latter example in Wireshark: and Google takes approximately 3 seconds to reply with HTTP/2 HEADERS. I don't know why so long but it has nothing to do with `curl`.
Then the server follows with DATA frames, and during that entire time which is less than 1 second like 100, 200 ms: `curl` replies with RST_STREAM on the stream and then GOAWAY on the connection.
You cannot expect no data will be sent: because you are requesting the data be sent. And `curl` needs to hang up after receiving the headers. ]] So for this case, the validity of "--max-filesize 0" seems to hold.
Meanwhile I noted using "--max-filesize 1" with "-L", had caused those carp a redirection message of length: To croak amid the redirection for "(63) Maximum file size exceeded".
Workaround would be rising the limit to somewhat higher more tolerable value, e.g. "2K" (2,048 B). [ I find "1500" works more pleasantly. Though a bit more bother to type. ] Non-Plain-Text output will regardless not be output to terminal: unless explicitly requested via "-o -" alike. When dealing with some extraordinarily small files: "/dev/null" alike may have to be bothered.
I see, you are saying that --max-filesize applies to servers that redirect. Users of --max-filesize may want to limit the overall bytes downloaded even if it's specifically documented as file size downloaded, so I'm not sure that's a bug. What happens on redirect is curl is discarding the bytes like if the redirect is from localhost/foo to localhost/bar then it ignores foo download (* Ignoring the response-body
) and downloads bar but it has to read the bytes of foo (which location redirects may have).
Anyone else have an opinion on whether this is appropriate behavior?
[[ I see, you are saying that "--max-filesize" applies to servers that redirect. . Users of "--max-filesize" may want to limit the overall bytes downloaded: Even if it's specifically documented as file size downloaded. So I'm not sure that's a bug.
What happens on redirect is: `curl` is discarding the bytes like, if the redirect is from "localhost/A" to "localhost/B": Then it ignores "A" download ("* Ignoring the response-body"), and downloads "B". But it has to read the bytes of "A" (which location redirects may have).
Anyone else have an opinion on whether this is appropriate behavior? ]] Perhaps a separation: "--max-dsize"? (parallel of "fsize")
The "foobar" non-sense is extraordinarily befuddling... Normalized and I still couldn't quite understand.
Anyone else have an opinion on whether this is appropriate behavior?
The ignored response-body should not be counted as "file download" data. That should be a bug if it is. The max filesize should be for the data actually delivered/saved, not just transferred I think.
you are saying that --max-filesize applies to servers that redirect.
Please take further discussion of that issue to #14899.
The "foobar" non-sense is extraordinarily befuddling..
They're placeholder names
I know, but anything involving "foobar" would be alike befuddling...
. Not just your writing.
Seemingly no-op, or null the limit. Somewhat against: https://curl.se/docs/manpage.html#--max-filesize
Use case: curl -v --max-filesize 1 -L "https://github.com/mozilla-mobile/firefox-android/assets/38040960/fd50937d-5442-494e-b4aa-0baf75569a57" . Effectively doing HEAD but with GET: [ ^ Alike what browsers do: https://bugzilla.mozilla.org/show_bug.cgi?id=1872503#c3 ] Certain servers may refuse to serve HEAD (example reported HTTP 403 Forbidden), meanwhile the file may be large.
Related: https://github.com/curl/curl/issues/11810