HaveF / feedparser

Automatically exported from code.google.com/p/feedparser
Other
0 stars 0 forks source link

Incremental parsing #361

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
If a feed is large it can be helpful to parse the feed incrementally to save 
memory. This can also improve processing speed, because a user can 
short-circuit parsing if they find an entry that has already been parsed.

Original issue reported on code.google.com by b...@atmaildot.com on 6 Jun 2012 at 4:16

GoogleCodeExporter commented 9 years ago
I've considered this idea before, but am currently rejecting it because this is 
something that should be handled at the HTTP level (using Last-Modified and 
ETag headers).

This may be an optimization opportunity to investigate sometime in the future, 
though, and I'm grateful that you submitted this because it lets me know that 
other people are thinking about the same thing! Thanks!

Original comment by kurtmckee on 19 Nov 2012 at 4:58

GoogleCodeExporter commented 9 years ago
I'm new to the business of parsing feeds and initially misunderstood the 
benefit of Last-Modified and ETag headers. If I've got it right, the entire 
feed is still downloaded even if there is only one new item. If not, throw the 
red flag because I'm doing something wrong. 

So, I'm in the middle of modifying another project for my own needs and 
discover this less than ideal situation. I then decide instead of hashing each 
item, as is currently done, to break when an item has a published date that is 
older than the Last-Modified header that I am sending to the server. Then, I 
realize that this should be done in feedparser if it's not done at the server 
level.

I want to do it in feedparser, but don't want to do something that won't be 
accepted upstream. So, is this far enough into the future that you would 
consider accepting or collaborating on something?

My frame of mind is, if I can do something to optimize processing time, I 
should. Let the server folks worry about how to make efficient use of 
bandwidth, since that is beyond my control.

Original comment by Kvan...@gmail.com on 7 Nov 2013 at 7:36