Closed slitayem closed 9 years ago
This page parses fine for me. What version of lxml are you using? I'm using 3.4.1.
FWIW, BlockifyError is only raised when lxml can't parse the document.
I'm using lxml 3.4.3.
Upgraded to lxml 3.4.3 but can't reproduce this BlockifyError. Here's a gist to test:
https://gist.github.com/matt-peters/66564e0684bfaf513968
This outputs
Sie verwenden eine veraltete Browserversion. Bitte verwenden Sie eine unterstütze Version damit Sie MSN optimal nutzen können.
MSN Deutschland – mit Hotmail Nachfolger Outlook und Messenger Skype Durch Nutzung dieser Webseite stimmen Sie der Verwendung von Cookies für Analysezwecke, personalisierte Inhalte und Werbung zu. Sie verwenden eine veraltete Browserversion. Bitte verwenden Sie eine unterstütze Version damit Sie MSN optimal nutzen können.
Are you on the latest master branch and do the unit tests pass for you?
Actually, I installed dragnet throw pip install
.
OK - the first thing to do is make sure your install is working properly. Clone from master, install and then run the tests (make test) to ensure they pass. If they do and you are still getting the exception then I'm not sure what else can be done, it's likely some lower level library like libxml2 at that point. If I can't reproduce on my end then there isn't much I can do.
The output of pip freeze
might be helpful. The difficulty in reproducing this is compounded by the fact that this relies so heavily on extensions and in general has a rather large dependency chain (for, say, libxml2).
Hi, I am getting
BlockifyError
when trying to get main content of the following html page with
content_extractor.analyze
Thanks and sorry for the long comment.