apache / incubator-stormcrawler

A scalable, mature and versatile web crawler based on Apache Storm
https://stormcrawler.apache.org/
Apache License 2.0
887 stars 262 forks source link

DOM generated by JSOUP parser doesn't match XPATH expressions #666

Closed jnioche closed 5 years ago

jnioche commented 5 years ago

This is a bug caused by the changes introduced in #653

jnioche commented 5 years ago

Namespaces, as usual :-(

The main difference with the previous class is in https://github.com/DigitalPebble/storm-crawler/blob/1.11/core/src/main/java/com/digitalpebble/stormcrawler/parse/JSoupDOMBuilder.java#L110

Will fix and test now