chimbori / crux

Crux offers a flexible plugin-based API & implementation to extract interesting information from Web pages.
Apache License 2.0
239 stars 43 forks source link

page the fails badly #18

Open johngray1965 opened 4 years ago

johngray1965 commented 4 years ago

on the follow page: https://www.cnbc.com/2020/01/07/how-to-set-a-family-member-with-a-disability-on-a-great-financial-path.html

Crux fails badly. It only gets some text from the middle of the page.

I realize working well on all pages is very difficult task, but I'm hoping you can figure something out nonetheless.

lucidl commented 1 year ago

On these pages (https://www.novinky.cz/clanek/domaci-vlada-kratom-nezakazala-40437591, https://www.root.cz/clanky/ubuntu-pripravuje-nemenny-desktop-postaveny-na-ubuntu-core-a-snapech the main content is identified wrong. These are quite popular webs in the Czech Republic. Why this happens and how it could be fixed? Thank you.