crwlrsoft / crawler

Library for Rapid (Web) Crawler and Scraper Development
https://www.crwlr.software/packages/crawler
MIT License
312 stars 11 forks source link

Question regarding "Failed to load % cURL error 60: SSL: no alternative certificate subject name matches target host name" #143

Closed severfire closed 2 months ago

severfire commented 2 months ago

Hi,

On some websites I am getting this error, yet when I use browser it works ok.

How can I fix this issue, or do some workaround?

Thanks for great toolset! a.

otsch commented 2 months ago

Hi,

first of all: nice to hear you like the library! 😊

Looks like something's wrong with the ssl certificate on the pages you're trying to load. Are you sure that your browser does not show any problems with the certificate on that website(s)? If you could share the pages you're trying to load, I can have a look at it. If you don't want to share it here, you can DM me on twitter or use the contact form on crwlr.software.

A thing that you can try, which is definitely not recommended (because you should always verify ssl certificates), is: you can provide your Crawler with a custom guzzle client instance, that is configured to not verify ssl certificates, like:

use Crwlr\Crawler\HttpCrawler;
use Crwlr\Crawler\Loader\Http\HttpLoader;
use Crwlr\Crawler\Loader\LoaderInterface;
use Crwlr\Crawler\UserAgents\BotUserAgent;
use Crwlr\Crawler\UserAgents\UserAgentInterface;
use GuzzleHttp\Client;
use Psr\Log\LoggerInterface;

class MyCrawler extends HttpCrawler
{
    protected function loader(UserAgentInterface $userAgent, LoggerInterface $logger): LoaderInterface
    {
        $httpClient = new Client(['verify' => false]);

        return new HttpLoader($userAgent, $httpClient, $logger);
    }

    protected function userAgent(): UserAgentInterface
    {
        return new BotUserAgent('MyCrawler');
    }
}

As I'm pretty sure, it's not a problem with the library I'll close the issue, but as mentioned, you can contact me via mentioned channels.