Open avk opened 6 months ago
I think I've traced this particular effect to this part of #sanitize:
node.css("h1, h2, h3, h4, h5, h6").each do |header|
header.remove if class_weight(header) < 0 || get_link_density(header) > 0.33
end
What's the thinking behind link density, especially applied to headings? Is there a way to customize or tune this?
This is the same code as was in the original readability.js from which this was ported, I think. You could parameterize it if you want to make it more flexible.
Thanks for your work on this neat gem.
Running readability on the HTML from https://100wordstory.org/submit/, I expected more markup to remain than readability leaves intact.
Expected
Observed
In the screenshot above, the following content is stripped out:
the red "Submit" heading:
the red "Submissions are now open through January 9, 2024" and "Submit!" headings and links:
Turning on
debug: true
doesn't seem to cite why these items are missing:Any ideas on how to broaden or include this content?