ComputerGhost / FaviconFetcher

Scan a webpage for favicons, or just easily download the one you want.
MIT License
5 stars 3 forks source link

Fetcher getting wrong content type from an example URL #7

Closed ComputerGhost closed 1 year ago

ComputerGhost commented 2 years ago

For this URL: https://ice-sentralbord.ice.no/

The scanner correctly pulls the possible favicon URIs. However the fetcher does not pull any of them down.

Upon further investigation, I found that the fetcher is receiving "text/html" as the content type for the icon. When testing in Chrome to see if this problem is on the website's side, I see the content type as "image/x-icon". Therefore, the problem is probably in the FaviconFetcher code somewhere.

kiddailey commented 1 year ago

This is because the server is redirecting, but not quite completely, and FaviconFetcher doesn't know what to do. Unfortunately, many, many sites do this sort of thing. For example, the request for:

https://ice-sentralbord.ice.no

Is dropped by the server and it instead responds with a text/html redirect to a new domain:

https://sentralbord.ice.no

FaviconFetcher attempts to decode this text/html redirect response into an image, which obviously fails.

I've created a fix for this that seems to work for most cases. Basically, when a redirect and a text/html response is received (proper redirects to the new file wouldn't be affected), I take the redirected domain and append the original path to it and attempt a new request. The second request would then look like the following, which works:

https://sentralbord.ice.no/favicon.ico

I'm working on a Pull Request and will comment here when done.

kiddailey commented 1 year ago

Pull request #9 Also addresses this issue. I meant to submit it as a separate pull request but forgot. If you want me to split them out, let me know.

I should also mention that there are some sites that redirect a surprising amount of times and my fix doesn't completely work for more than a single redirect, so I will probably make it recursive up to a specific amount of times.

ComputerGhost commented 1 year ago

Thank you for these tests and fixes. I'm merging the changes today and making some other minor edits.