FriendsOfPHP / Goutte

Goutte, a simple PHP Web Scraper
MIT License
9.26k stars 1.01k forks source link

Get images source from specific div. #310

Closed sam-deepweb closed 7 years ago

sam-deepweb commented 7 years ago

Hi, I'm trying to get source of images which are in a div by id="demo". Here's what I did:

$crawler->filter('#demo img')->each(function ($node) {
    $src = $node->attr('src');
    echo $src . '<br>';
});

It doesn't work But if you try getting any other attribute like image height or image class , it works fine. It only doesn't work when you pass src in attr().

klor commented 5 years ago

This should work:

<?php
require 'vendor/autoload.php';

use Goutte\Client;
$client = new Client();

use Symfony\Component\DomCrawler\Crawler;
$crawler = $client->request('GET', 'http://www.example.com/');

echo $crawler->filter('#demo img')->eq(0)->attr('src');
// Output: Image source
amirgee007 commented 4 years ago

how to get this image? that's is in the a tag

petrk94 commented 4 years ago

@amirgee007 I just had the same problem for my application. With this code you get the link to the image. To save the image, I think you can use wget/curl or file_get_contents. here is my code what worked for me to get the image:

Instead of echo the ouput (echo $crawler->filter('#demo img')->eq(0)->attr('src');) use a variable $linktoimage = $crawler->filter('#demo img')->eq(0)->attr('src'); Than put the variable to file_get_contents and proceed with the code below

$image_stream = file_get_contents($linktoimage);
$file = "my_image.jpg";
file_put_contents($file, $image_stream);
asghar-mansourian commented 3 years ago

hello you can get file address and get file by php copy function like: copy($url,$localAdress);

katesaikishore commented 3 years ago
$client = new Client();   
$crawler = $client->request('GET', 'https://factly.in/category/english/');
 echo $crawler->filter('.image img')->eq(0)->attr('src');

Screenshot 2021-02-08 at 2 25 41 PM

I am unable to get the image
please help me
amirgee007 commented 3 years ago

hi, its look like we dont have any separate img tag as they added a link for the image tag under A tag.

So you can try it like this.

`$path = 'https://factly.in/category/english/';

    $crawler = $client->request('GET', $path);

    $crawler->filter('.highlights')->each(function ($node) {

        try{
            $text = $node->filter('a')->html();

            preg_match("<img.*?src=[\"\"'](?<url>.*?)[\"\"'].*?>",$text,$output);

            dd($output['url']);
        }
        catch (\Exception $e){
            #dd($e);
        }

    });`
amirgee007 commented 3 years ago

`$path = 'https://factly.in/category/english/';

    $crawler = $client->request('GET', $path);

    $crawler->filter('.image-link ')->each(function ($node)  {

        $link = $node->filter('img')->attr('src');
        dd($link);
    });`
katesaikishore commented 3 years ago

Awesome thanks @amirgee007

yossefEl commented 3 years ago

`$path = 'https://factly.in/category/english/';

    $crawler = $client->request('GET', $path);

    $crawler->filter('.image-link ')->each(function ($node)  {

        $link = $node->filter('img')->attr('src');
        dd($link);
    });`

Thank you, but sometimes this script doesn't work, for example in my case instead of returning the URI of the image it return the following data :

"data:image/svg+xml;nitro-empty-id=NTU4OjIwOQ==-1;base64,PHN2ZyB2aWV3Qm94PSIwIDAgMTYxOSAxMDgwIiB3aWR0aD0iMTYxOSIgaGVpZ2h0PSIxMDgwIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==",

If anyone fixed this issue, please list it here and Thanks in advance!