Closed MikhailTNY closed 9 years ago
Hi @MikhailTNY,
It looks like the original image results in a 404 Not Found error: http://static.dudeiwantthat.com/exclusives/resize(640,533)/usb-travel-razor-15380.jpg
So I believe it is a problem with the feed itself.
Hi @benubois,
That's weird. If you go straight to the feed, it shows this URL: http://static.dudeiwantthat.com/exclusives/resize(640,533)/usb-travel-razor-15384.jpg
Maybe they updated the URL later?
@MikhailTNY,
Ah yeah that must be it. The version Feedbin has cached is http://static.dudeiwantthat.com/exclusives/resize(640,533)/usb-travel-razor-15380.jpg
Feedbin only updates posts if the number of characters have changed. In this case the number of characters probably stayed the same.
Is there a way to refresh the post manually if I see broken images later?
Hi @MikhailTNY,
There isn't currently, but that's an interesting idea.
I think ideally you shouldn't have to worry about if a post has been updated, Feedbin should just update it.
Maybe to prevent an issue like this coming up, where the post has changed but the number of characters hasn't, it would be better to use a checksum to detect differences.
Checksum++, it does seem like it is going to eat up a lot of CPU to calculate checksums on millions of articles, no?
Too bad pubsubhubbub never took off. IIRC, this can help a lot by getting notifications when the article is updated, no?
I was curious about the performance also and ran some tests. A cyclic redundancy check is slower than checking the length
but fast enough that it would be a drop in the bucket in the the http fetch -> xml parsing -> checking for new content cycle.
CRC is also faster and the result is smaller to store than sha1 or md5.
Here's the iterations per second test and results:
require 'benchmark/ips'
require 'zlib'
require 'digest'
text = <<-eos
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
eos
Benchmark.ips do |x|
x.report('length') do
text.length
end
x.report('crc') do
Zlib::crc32(text)
end
x.report('sha1') do
Digest::SHA1.hexdigest(text)
end
x.report('md5') do
Digest::MD5.hexdigest(text)
end
end
Calculating -------------------------------------
length 114050 i/100ms
crc 89377 i/100ms
sha1 19417 i/100ms
md5 19509 i/100ms
-------------------------------------------------
length 7563317.9 (±15.4%) i/s - 36838150 in 5.003004s
crc 2703430.5 (±12.7%) i/s - 13317173 in 5.006336s
sha1 239455.7 (±8.0%) i/s - 1203854 in 5.058653s
md5 243237.6 (±7.9%) i/s - 1209558 in 5.004162s
Yeah PuSH might be a little too complicated to set up to really catch on, but it's a huge help in getting timely updates.
Then it sounds like the way to go.
Hi guys,
RSS: http://feeds.feedburner.com/Dudeiwantthat
The code: