cantino / ruby-readability

Port of arc90's readability project to Ruby
Apache License 2.0
925 stars 170 forks source link

Add a way to bypass the options[:tags] whitelist #95

Closed tuzz closed 10 months ago

tuzz commented 11 months ago

Readability is primarily concerned with extracting text, not images. We are using readability to extract images by setting tags: %w[img] which preserves tags in the output HTML. However, this won’t work if the image is nested under a non-whitelisted node, e.g.

<figure>
  <img src=“…” />
</figure>

I think we basically just want to whitelist all nodes for extraction because our code already handles stripping out nodes it doesn’t care about. Therefore, add a mechanism to bypass the node whitelisting by setting tags: %w[*], i.e. a wildcard.

cantino commented 10 months ago

Thanks!