FriendsOfPHP / Goutte

Goutte, a simple PHP Web Scraper
MIT License
9.26k stars 1.01k forks source link

Get link from a td while iterating through table columns #367

Closed cavewebs closed 5 years ago

cavewebs commented 5 years ago

I have html table and I want to make array from the table

$html = '<table>
<tr>
    <td>user1</td>
    <td>address1</td>
    <td>dob1</td>
    <td>status1</td>
    <td>link1</td>
</tr>
<tr>
    <td>user2</td>
    <td>address2</td>
    <td>dob2</td>
    <td>status2</td>
    <td>link2</td>
</tr>
<tr>
    <td>user3</td>
    <td>address3</td>
    <td>dob3</td>
    <td>status3</td>
    <td>link3</td>
</tr>
</table>

As you can see, the last column is a link while the others are text, I want to extract the text and link so that my array must look like this

 array(
      "user1",
      "address1",
      "dob1",
      "status1",
      "<a href='link1'><img src='profile.jpg' /><a/>",
   ),
 array(
      "user2",
      "address2",
      "dob2",
      "status2",
      "<a href='link2'><img src='profile.jpg' /><a/>",
   ),
 array(
      "user3",
      "address3",
      "dob3",
      "status3",
      "<a href='link3'><img src='profile.jpg' /><a/>",
   )
)

I now can use this function below to get the text from the table

$table = $crawler->filter('table')->filter('tr')->each(function ($tr, $i) {
    return $tr->filter('td')->each(function ($td, $i) {
        return trim($td->text());
    });
});

but the last column is a link how do i capture it using link()?

joveice commented 5 years ago
foreach ($table as $k => $v) {
    $table[$k][4] = '<a href=" . $v[4] . "><img src="profile.jpg"></a>'
}

?

cavewebs commented 5 years ago

I was able to get the link by using an if condition in the second loop to check (if $1==4) like so

$table = $crawler->filter('table')->filter('tr')->each(function ($tr, $i) {
    return $tr->filter('td')->each(function ($td, $i) {
       if ($i==4) {return $td->filter('a')->extract('href')[0]; }
         else{ return trim($td->text()); }
    });
});
jeanpool1937 commented 5 years ago

No más correos por favor

Esdras Pallarco

El 10 ene. 2019, a la(s) 10:44 a. m., Timchosen Uzua notifications@github.com escribió:

Closed #367.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.