fabiofumarola / HyLiEn

Implementation of the paper: Fabio Fumarola, Tim Weninger, Rick Barber, Donato Malerba, Jiawei Han: HyLiEn: a hybrid approach to general list extraction on the web. WWW (Companion Volume) 2011: 35-36
Apache License 2.0
1 stars 2 forks source link

Hylien fails to extract main content for some website #16

Closed fabiana001 closed 8 years ago

fabiana001 commented 8 years ago

List of web pages:

http://www.idealista.it/vendita-case/milano-milano/

fabiana001 commented 8 years ago

To extract content lists use normalized tree edit distance and set maxRecordTags>=60