gaffling / PHP-Grab-Favicon

🖼 Saves the favicon of the given URL and returns the image path.
http://suchmaschine.biz
MIT License
26 stars 6 forks source link

Issue With Website Security Checks #7

Open LeeThompson opened 1 year ago

LeeThompson commented 1 year ago

There is a problem if one of the "need to review the security of your connection" checks comes up when get-fav is attempting to find icons.

Unfortunately, I don't think this is fixable (other than trying again or hoping the API catches things) using cURL.

Some of this may be user agent related, will try to see if some sites are happy enough with the default cURL user agent.

Will look into some possible solutions.

gaffling commented 1 year ago

Mabe better use:

curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36');

than

curl_setopt($ch, CURLOPT_USERAGENT, getGlobal('curl_useragent'));

The getGlobal('curl_useragent') function is likely to return a generic user agent that may be out of date or not contain all the required information.

LeeThompson commented 1 year ago

curl_useragent is set earlier via a command line switch so it can be whatever it needs to be. (I'm actually changing it to http_usergent internally since I'm having the non-curl route be able to change the string as well.)

It defaults to FaviconBot/1.0 or FaviconBot/1.0 (+http://'.$_SERVER['SERVER_NAME'].'/

Anyway you can have it use the one you specified with: --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36" switch when invoking get-fav.php

LeeThompson commented 1 year ago

Hmm this seems to be an issue with exif_imagetype as well, fortunately the user agent can be set.
Now if user-agent is set on the command line switch, php's user agent will be temporarily set as well.

Example: https://pcpartpicker.com/favicon-32x32.png will cause exif_imagetype to return false if php's user agent is not set because it gets a 403 forbidden.

LeeThompson commented 1 year ago

I guess the big question is, should we set the user agent to something like "Mozilla/5.0..." by default?

LeeThompson commented 1 year ago

In my branch, the default user agent is now defined near the beginning so someone could change it there. Soon it will be able to be set in an ini file.

define('DEFAULT_USER_AGENT', "FaviconBot/1.0/");