Readability is primarily concerned with extracting text, not images. We are using readability to extract images by setting tags: %w[img] which preserves tags in the output HTML. However, this won’t work if the image is nested under a non-whitelisted node, e.g.
<figure>
<img src=“…” />
</figure>
I think we basically just want to whitelist all nodes for extraction because our code already handles stripping out nodes it doesn’t care about. Therefore, add a mechanism to bypass the node whitelisting by setting tags: %w[*], i.e. a wildcard.
Readability is primarily concerned with extracting text, not images. We are using readability to extract images by setting
tags: %w[img]
which preserves tags in the output HTML. However, this won’t work if the image is nested under a non-whitelisted node, e.g.I think we basically just want to whitelist all nodes for extraction because our code already handles stripping out nodes it doesn’t care about. Therefore, add a mechanism to bypass the node whitelisting by setting
tags: %w[*]
, i.e. a wildcard.