agourlay / dlm

Minimal HTTP download manager
Apache License 2.0
66 stars 10 forks source link

Add option to set Accept header #282

Closed ghost closed 4 months ago

ghost commented 4 months ago

This PR adds an option to set the Accept header explicitly in the client. This is useful in the case that the server can hand back a better format e.g. WebP instead of JPEG, and the client simply has to let the server know.

agourlay commented 4 months ago

Thanks for the contribution!

How did you test this?

I can see two things to pay attention to with this change.

First, the extension of the file is based on the input URL or an extra request to determine it. By passing the accept header only to the download request, we would not use the right extension.

The same logic also apply for the content-length which is determine by a HEAD request before hand. Downloading a different format would result in incorrect progress bars and download resume.

WDYT?

ghost commented 4 months ago

Good insights!

For testing, I simply hard-coded image/webp like so: let request = client.get(...).header("Accept", "image/webp") and verified that I had downloaded webp instead of jpeg. The extension is incorrect, but most software I use for images will sniff the data rather than trust the extension. This can occasionally break things though.

To answer the questions:

1) I think the way to fix this and have the correct extension is to verify the Content-Type the server responds with, and to base the extension on the mime type received. A properly conforming server should inform the client when returning the data.

2) You are correct that will likely affect the Content-Length, and I'm not sure how common it is for servers to correctly report the length when receiving this header during a HEAD request. If you set Accept during a head request, it might work properly? I'm not sure.

Background:

I originally conceived of this modification when I noticed a discrepancy between browser responses and what dlm was downloading. On checking the console, I noted that some sites return WebP images despite requesting a .jpg in the URL, which led to learning about the Accept header. The quality difference was notable between formats, and I was surprised that the server would respond with a different type than I had requested.

For my own use-case, I leave dlm in the background and only check after a few hours to see if it finished, usually it's pulling a large number of files concurrently.

agourlay commented 4 months ago

Thanks for the detailed answer :+1:

  1. Sounds good to me
  2. Normally the Content-length is just used for sizing the progress bars, it should be fine if the server does not reply properly on the HEAD request without taking into account the Accept header.

I propose you try to send the extra header for all requests. Maybe the code needs some refactoring to have less parameters passed around, some kind of request builder abstraction?

Glad to hear dlm is useful to you!

ghost commented 4 months ago

Added a small change to send Accept with the HEAD. So far working on my build. Let me know what you think.

After mulling this over, I think you're right that some request builder abstraction would be a good idea. Some set of headers could be configured by the user up front, and that information passed around the codebase. It might be more work than it's worth for your use-case or mine, but that does seem like a useful idea.

agourlay commented 4 months ago

Thanks for the contribution :+1:

I will take of the refactoring and extra testing :robot: