ComputerGhost / FaviconFetcher

Scan a webpage for favicons, or just easily download the one you want.
MIT License
5 stars 3 forks source link

Scanner cannot locate links not located outside HTML head element #18

Closed kiddailey closed 5 months ago

kiddailey commented 5 months ago

I've discovered that a surprising amount of sites put <link> icon shortcuts in weird places, resulting in only the default favicon.ico request being added to the queue. And, in many of these cases, that default favicon request also fails because the sites in question don't provide the standard favicon at the default location.

As one extreme example, ChowNow.com. They currently have their icon <link> elements placed (incorrectly) BETWEEN the <head> and <body> elements. They are valid links and provide the standard 16x16 (but with a custom filename), 120x120 and 120x120-precomposed icons. Unfortunately, the DefaultScanner ends its search at </head> and does not find these links.

A partial fix would be to parse through to the end of the body, but it won't catch icon links at the end of the document.

kiddailey commented 5 months ago

I've implemented the change to parse through to the body end and it doesn't appear to impact performance noticeably in my tests. Will submit as a pull request as soon as I am able.

kiddailey commented 5 months ago

PR #22 resolves this issue.