FriendsOfPHP / Goutte

Goutte, a simple PHP Web Scraper
MIT License
9.26k stars 1.01k forks source link

Get url and title with same filter #322

Open Takuaraa opened 6 years ago

Takuaraa commented 6 years ago

Hello,

Is it possible to have a "filter"-code inside another "filter"-code. For example I have this to get me the urls from a website:

...
$links = $crawler->filter('a')->each(function ($node) use ($url){
    $l = $node -> attr('href');

    ..code..

      return $l;
  });

This returns me something like: abc.com, abc.com/defg, ... Now I would like to get also the titles from those links found above. The code would be this:

$titles = $crawler->filter('title')->each(function ($node) {
     $content .= "Title: ".$node->text()."<br>";
     return $content;
   });

Is it possible to have the second part of the code (title) inside the first part(link), so that whenever it gets the link it also gets the title of that link?

I hope I was understandable. Thank you.

dpde commented 6 years ago
$links = $crawler->filter('a')->each(function($node) {
    $href  = $node->attr('href');
    $title = $node->attr('title');
    $text  = $node->text();

    return compact('href', 'title', 'text');
});
Takuaraa commented 6 years ago

Thank you so much. I only have one more question. Is there also a way to get the description of a link?

dpde commented 6 years ago

What do you mean by description? Can you post an example of that link?

Takuaraa commented 6 years ago

youtube.com

<meta name="description" content="Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.">

dpde commented 6 years ago

That is not a link tag, thats a meta tag.

If that tag is on the linked page, you have to request that page and extract that information.

Takuaraa commented 6 years ago

$crawler->filterXpath('//meta[@name="description"]')->attr('content');

This line of code should give me the description but how do I use it with the code from before?

kowap commented 6 years ago

$node->text()

$node->text() always is empty =(

RachidBourougaa commented 5 years ago

$description = $crawler->filterXpath("//meta[@name='description")->extract('content'); $description will contain the meta tag description.