DKarap / web-driver

crawler that use the webdriver, ghostdriver/phantomJS
0 stars 1 forks source link

Detect semantics:Including list tag <li> produce problems if include an a tag! #27

Closed DKarap closed 10 years ago

DKarap commented 10 years ago

in page http://www.huisopdewaard.nl/hodw/vacatures.html we detect correctly the semantics but the semantic urls are false. If we dont extarct

  • tags then all is good..

  • DKarap commented 10 years ago

    skip

  • and html tags form using them to extract links form web pages