feedbin / support

83 stars 11 forks source link

Broken Images for DudeIwantThat.com #492

Closed MikhailTNY closed 9 years ago

MikhailTNY commented 9 years ago

Hi guys,

RSS: http://feeds.feedburner.com/Dudeiwantthat

The code:

<div data-behavior="entry_content_wrap" class="content-styles">
            <img src="https://camo-feedbin.herokuapp.com/64c8c5dc31a0963869c8b78952d53b679b0457e4/687474703a2f2f7374617469632e647564656977616e74746861742e636f6d2f6578636c7573697665732f726573697a65283634302c353333292f7573622d74726176656c2d72617a6f722d31353338302e6a7067" width="640" height="533" data-canonical-src="http://static.dudeiwantthat.com/exclusives/resize(640,533)/usb-travel-razor-15380.jpg"><p>While the trend these days seems to be giving up shaving altogether, ShaveTech has an alternative for you: just give up your charger. Their smartphone-sized electric razor has a handy fold-out USB port for easy charging on the road or at the office (or in the car) from any USB power source.</p>

<p>The ShaveTech electric razor with USB plug powers is available from <a rel="nofollow" target="_blank" href="https://exclusives.dudeiwantthat.com/sales/the-shavetech-usb-travel-razor-free-us-shipping">Dude Exclusives</a> for 36% off its $30 retail value for a limited time.</p>

<img src="https://camo-feedbin.herokuapp.com/791fa7be12e439786962bb550d63ebc06c91335e/687474703a2f2f66656564732e666565646275726e65722e636f6d2f7e722f447564656977616e74746861742f7e342f374e674333517254737073" height="1" width="1" alt="" data-canonical-src="http://feeds.feedburner.com/~r/Dudeiwantthat/~4/7NgC3QrTsps">
          </div>
benubois commented 9 years ago

Hi @MikhailTNY,

It looks like the original image results in a 404 Not Found error: http://static.dudeiwantthat.com/exclusives/resize(640,533)/usb-travel-razor-15380.jpg

So I believe it is a problem with the feed itself.

MikhailTNY commented 9 years ago

Hi @benubois,

That's weird. If you go straight to the feed, it shows this URL: http://static.dudeiwantthat.com/exclusives/resize(640,533)/usb-travel-razor-15384.jpg

Maybe they updated the URL later?

benubois commented 9 years ago

@MikhailTNY,

Ah yeah that must be it. The version Feedbin has cached is http://static.dudeiwantthat.com/exclusives/resize(640,533)/usb-travel-razor-15380.jpg

Feedbin only updates posts if the number of characters have changed. In this case the number of characters probably stayed the same.

MikhailTNY commented 9 years ago

Is there a way to refresh the post manually if I see broken images later?

benubois commented 9 years ago

Hi @MikhailTNY,

There isn't currently, but that's an interesting idea.

I think ideally you shouldn't have to worry about if a post has been updated, Feedbin should just update it.

Maybe to prevent an issue like this coming up, where the post has changed but the number of characters hasn't, it would be better to use a checksum to detect differences.

MikhailTNY commented 9 years ago

Checksum++, it does seem like it is going to eat up a lot of CPU to calculate checksums on millions of articles, no?

Too bad pubsubhubbub never took off. IIRC, this can help a lot by getting notifications when the article is updated, no?

benubois commented 9 years ago

I was curious about the performance also and ran some tests. A cyclic redundancy check is slower than checking the length but fast enough that it would be a drop in the bucket in the the http fetch -> xml parsing -> checking for new content cycle.

CRC is also faster and the result is smaller to store than sha1 or md5.

Here's the iterations per second test and results:

require 'benchmark/ips'
require 'zlib'
require 'digest'

text = <<-eos
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
eos

Benchmark.ips do |x|
  x.report('length') do
    text.length
  end
  x.report('crc') do
    Zlib::crc32(text)
  end
  x.report('sha1') do
    Digest::SHA1.hexdigest(text)
  end
  x.report('md5') do
    Digest::MD5.hexdigest(text)
  end
end
Calculating -------------------------------------
              length    114050 i/100ms
                 crc     89377 i/100ms
                sha1     19417 i/100ms
                 md5     19509 i/100ms
-------------------------------------------------
              length  7563317.9 (±15.4%) i/s -   36838150 in   5.003004s
                 crc  2703430.5 (±12.7%) i/s -   13317173 in   5.006336s
                sha1   239455.7 (±8.0%)  i/s -    1203854 in   5.058653s
                 md5   243237.6 (±7.9%)  i/s -    1209558 in   5.004162s

Yeah PuSH might be a little too complicated to set up to really catch on, but it's a huge help in getting timely updates.

MikhailTNY commented 9 years ago

Then it sounds like the way to go.