Open tinloaf opened 6 years ago
I think the problem isn't on your side but on lesserwrong.com which is using Cloudflare so I guess it might be the same issue than https://github.com/wallabag/wallabag/issues/1399#issuecomment-350988404
That might very well be it. Is there a way of seeing the HTML that the parser sees? Then I could verify that it's in fact the Cloudflare anti-bot page.
Without going into the code of wallabag/graby, no you can't.
Find that file and var_dump()
the $html
: https://github.com/j0k3r/graby/blob/master/src/Extractor/ContentExtractor.php#L203
Hi,
I'd like to parse pages from www.lesserwrong.com. I've tried creating a site config based on this page: https://www.lesserwrong.com/rationality/what-do-we-mean-by-rationality
This is how my site config looks like:
As far as I can tell, these XPaths all point to the correct elements inside that page. Howeve, the tool at https://f43.me/feed/test still fails to parse the page. Did I mess up the site config, or is this a bug in the parser (and if so, is this the right repository to report such a bug?)