FriendsOfPHP / Goutte

Goutte, a simple PHP Web Scraper
MIT License
9.26k stars 1.01k forks source link

The URL of the element is relative error #368

Closed joveice closed 1 year ago

joveice commented 5 years ago

html:

<article>
    <a href="/2">
        <h2>Text</h2>
    </a>
</article>

code:

foreach ($crawler->filter('article') as $dom_element) {
        $node = new Crawler($dom_element);
        $link = $node->filter('a')->first()->link(); # this fails
        if (Article::where('url', $link->getUri())->exists()) {
            continue;
        }
...
[2019-01-09 20:47:53] local.ERROR: The URL of the element is relative, so you must define its base URI passing an absolute URL to the constructor of the Symfony\Component\DomCrawler\AbstractUriElement class ("" was passed). {"userId":1,"email":"admin@yourcompany.com","exception":"[object] (InvalidArgumentException(code: 0): The URL of the element is relative, so you must define its base URI passing an absolute URL to the constructor of the Symfony\\Component\\DomCrawler\\AbstractUriElement class (\"\" was passed). at /home/vagrant/code/moscowdb/vendor/symfony/dom-crawler/AbstractUriElement.php:52)
joveice commented 5 years ago

Oh, it's because I initiate a new crawler instance and it has no clue what the base url is right?

bavial commented 1 year ago

You you must define base URI to crawler, because the URL of the element is relative. Example: $crawler = new Crawler('', 'http://www.example.com');