HaveF / feedparser

Automatically exported from code.google.com/p/feedparser
Other
0 stars 0 forks source link

Add support for hashing the feed contents / avoid re-parsing #370

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
For gPodder (gpodder.org) on mobile devices with limited processing power, it 
would be great to avoid parsing feeds if the contents do not change.

I already take advantage of the ETag and If-Modified-Since features in 
feedparser, but some web servers don't support this feature, and return a new 
feed even if the content doesn't change.

My proposal now is that feedparser can (optionally or by default) create a hash 
over the whole feed contents (as ready from a file or over the network) and 
return this in the response. The application using feedparser can then provide 
the hash to the feedparser parse method at the next update, and feedparser will 
compare the hashes of the contents and avoid parsing the feed again if the 
contents didn't change.

This will still consume network traffic, because the whole feed has to be 
downloaded again, but it will reduce CPU time, because the feed doesn't need to 
be parsed if the same content has already been parsed before.

Is this something that can be included in feedparser? I might even write and 
submit the patch myself, but there's no point in trying to pursue this 
direction if it's not going to be included upstream (I want to avoid having to 
ship a local fork of feedparser with my application).

Original issue reported on code.google.com by th.perl@gmail.com on 9 Aug 2012 at 9:14

GoogleCodeExporter commented 9 years ago
Proposed patch.

Original comment by th.perl@gmail.com on 9 Aug 2012 at 12:11

Attachments:

GoogleCodeExporter commented 9 years ago
Related gpodder bug: https://bugs.gpodder.org/show_bug.cgi?id=1643

Original comment by th.perl@gmail.com on 9 Aug 2012 at 12:14

GoogleCodeExporter commented 9 years ago
Thomas, I've been considering this for a while, and I've currently concluded 
that this is something that should be handled external to feedparser, before 
calling `feedparser.parse()`.

Original comment by kurtmckee on 2 May 2013 at 3:54