cantino / ruby-readability

Port of arc90's readability project to Ruby
Apache License 2.0
919 stars 170 forks source link

How to captured image when I used Readability::Document.new(source).content? #80

Open haluan opened 9 years ago

cantino commented 9 years ago

You can do .images instead of .content.

plcstevens commented 9 years ago

If you mean you want the images to be embedded inside the content you can also try setting the option:

{
  tags: YOUR_TAGS + %w(img),
  remove_empty_nodes: false
}

You may need to add src to the list of attributes as well.

polakowski commented 8 years ago

When importing content from medium.com you need to add "figure" to allowed tags

matheussilvasantos commented 4 years ago

@haluan, is it solved for you?

ryzalyusoff commented 3 years ago

I have the same problem. Adde img and remove_empty_nodes: false are doing nothing and still the images are still being removed from content.

url = "https://medium.com/better-advice/20-things-most-people-learn-too-late-in-life-23674cdbd75c"
body = open(url).read
rbody = Readability::Document.new(body, :tags => %w[div p img a figure], :attributes => %w[src href figure], :remove_empty_nodes => false).content
rbody.content