Closed sachaw closed 4 years ago
The following html does not seem to be parsing correctly:
<div id="titlebar"><div id="helpbutton">?</div><h2><a href="min-1246.html">Defernite</a> : <span style='font-size:smaller'>Ca<sub>6</sub>(CO<sub>3</sub>)<sub>2-x</sub>(SiO<sub>4</sub>)<sub>x</sub>(OH)<sub>7</sub>(Cl,OH)<sub>1-2x</sub> (x<0.5)</span>, <a href="min-1856.html">Hematite</a> : <span style='font-size:smaller'>Fe<sub>2</sub>O<sub>3</sub></span><div class='titleloc'><a href="loc-2427.html">Kombat Mine, Kombat, Grootfontein, Otjozondjupa Region, Namibia</a></div></h2> </div>
When I run:
$minerals = $crawler->filter('body > div > div#titlebar > h2')->each(function($mineral) use (&$mineral_ids) { print_r($mineral->text()); }
It only returns one element when there should be 2. The error seems to be Goutte not detecting the first <span> closing.
<span>
Reference URL: https://www.mindat.org/photo-804.html
https://www.mindat.org/photo-804.html
Thanks.
The HTML is invalid, x<0 should be x<0 the < is seen as an opening tag
x<0
x<0
<
Picked up via http://validator.w3.org/
ok, thanks. Ill try and create a workaround
The following html does not seem to be parsing correctly:
When I run:
It only returns one element when there should be 2. The error seems to be Goutte not detecting the first
<span>
closing.Reference URL:
https://www.mindat.org/photo-804.html
Thanks.