chimbori / crux

Crux offers a flexible plugin-based API & implementation to extract interesting information from Web pages.
Apache License 2.0
239 stars 43 forks source link

images, videos, iframes #12

Open piaci opened 5 years ago

piaci commented 5 years ago

hello! parsed dom does not seem to include images (nor iframe, hence videos). article object holds a list of Article.Images but Image class is private to get its properties (unless one is parsing that string into an object).

is there any way to: a) make Image public? b) add a new property to Image object to point it's location (dom position, text position...)? c) create a similar object for iframes?

this link includes many images (not even lazy load ones) but none appears after parsing, meanwhile article.images holds them all: https://www.slashgear.com/apple-airpods-2-review-price-performance-wireless-charging-04572129/

edwinRNDR commented 4 years ago

Running into same issue as @piacimosca - Article.Image has package visibility and should have public visibility.