The used readability module often returns garbage from the parsed HTML sites, which leads not only to unusable fulltext properties, but also to awful wrong matches in the tagging engine and sometimes crappy summary texts.
A custom optimized readability algorithm is needed, that is more accurate than the current implementation, and as fast as possible (<100ms on casual hardware and common websites).
The used readability module often returns garbage from the parsed HTML sites, which leads not only to unusable fulltext properties, but also to awful wrong matches in the tagging engine and sometimes crappy summary texts.
A custom optimized readability algorithm is needed, that is more accurate than the current implementation, and as fast as possible (<100ms on casual hardware and common websites).