gaffling / PHP-Grab-Favicon

🖼 Saves the favicon of the given URL and returns the image path.
http://suchmaschine.biz
MIT License
26 stars 6 forks source link

exif_imagetype Issues #13

Closed LeeThompson closed 1 year ago

LeeThompson commented 1 year ago

We are using exif_imagetype to determine image type, it takes a filename (or url) as a parameter. Unfortunately, when it's a URL there is no way to set a timeout.

The solution may be to download the image to disk temporarily for testing but that has it's own issues.

Figured out a way to do a timeout, implemented in 202305241420. It's still not ideal, the HTTP functions imagetype uses aren't very good. (Some sites won't talk to it, even with user agent being set.)

I also notice in some cases if we find an icon, it's technically downloaded twice. With icons this isn't a huge deal since they are tiny but maybe there's a more efficient way of doing this. I have something in mind and we'll see how it goes.

gaffling commented 1 year ago

We can try something like this:

function checkImageTypeWithTimeout($filename, $timeoutInSeconds) {
    $result = null;
    $pid = pcntl_fork();
    if ($pid === -1) {
        die("Fork Error");
    } elseif ($pid) {
        pcntl_signal(SIGALRM, function () use (&$result) {
            $result = false;
        });
        pcntl_alarm($timeoutInSeconds);
        pcntl_wait($status);
        pcntl_alarm(0);
        return $result;
    } else {
        $result = exif_imagetype($filename);
        exit();
    }
}
LeeThompson commented 1 year ago

I got the timeout working for the most part, it's the HTTP protocol stuff. Some sites are really picky. Requiring https and so forth. My thought is to download it into memory with curl and determine the image type looking at the header (I've written code for that before, although not in PHP).

LeeThompson commented 1 year ago

My test is going very well so far, I have more cases to try but we should end up being more efficient have better results and basically no longer using exif_imagetype.

I still need to apply this for the non-cURL path so it will be awhile before I do a commit on my fork.

Update: This is going quite well although I'm still working out bugs. Now exif will be used if it's enabled in the local PHP installation and if everything else fails.

Update 2: exif_imagetype should be used very rarely now. While not perfect, I consider this closed.