HaveF / feedparser

Automatically exported from code.google.com/p/feedparser
Other
0 stars 0 forks source link

temporarily provide fallback to `published` when `updated` is not available (see issue 310) #328

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The next release of feedparser after 5.1 will introduce a fix to the pubDate 
mapping:

* Issue 310 (pubDate should map to `published`, not `updated`)

However, this change also results in the "updated" field not being available in 
some cases where developers (like myself) have relied on updated being 
available. This will break code that assumes that the RSS "pubDate" field will 
appear in "updated" and "updated_parsed".

My proposal is as follows:

 * If there is no "updated" attribute, and someone tries to access it, return the "published" attribute instead
 * If there is no "updated_parsed" attribute, and someone tries to access it, return the "published_parsed" attribute instead

I've already updated the code of my application (gPodder) today to prefer 
"published" over "updated", but I obviously can't update installations "in the 
wild", so these will potentially break when a new feedparser version is 
released and installed on users' systems. For obvious reasons, I also want to 
avoid bundling feedparser with gPodder.

Example with the current Git master branch (github/kurtmckee/feedparser):

  >>> import feedparser
  >>> f = feedparser.parse('http://lugradio.org/episodes.rss')
  >>> f.entries[0].updated
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "feedparser.py", line 393, in __getattr__
      raise AttributeError, "object has no attribute '%s'" % key
  AttributeError: object has no attribute 'updated'

However, with feedparser <= 5.1, I can use the "updated" attribute:

  Python 2.7.1 (r271:86832, Jul 31 2011, 19:30:53) 
  [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
  Type "help", "copyright", "credits" or "license" for more information.
  >>> import feedparser
  >>> f = feedparser.parse('http://lugradio.org/episodes.rss')
  >>> f.entries[0].updated
  u'Thu, 07 Jul 2011 00:00:00 GMT'

For situations where the "updated" field is populated from some other parts of 
the feed, the new behavior is probably okay, but in situations where "updated" 
(and consequently updated_parsed) isn't set at all, it would be good to still 
return a value.

Original issue reported on code.google.com by th.perl@gmail.com on 12 Feb 2012 at 11:26

GoogleCodeExporter commented 9 years ago
This is obviously a problem. I'm going to have to think about how to handle 
this so that users aren't affected by this when they upgrade, but without 
introducing a convenience mapping that developers come to rely on.

Original comment by kurtmckee on 12 Feb 2012 at 6:18

GoogleCodeExporter commented 9 years ago
One idea would be to issue a warning, deprecate that behavior and make the 
warning message clearly state that this deprecated behavior will be removed 
e.g. at the release that happens at earliest one year from the release date of 
the version in which the deprecation was introduced.

http://docs.python.org/library/warnings.html

Just an idea, I'm sure you can come up with something clever :)

Original comment by th.perl@gmail.com on 13 Feb 2012 at 8:53

GoogleCodeExporter commented 9 years ago
I don't come up with clever things, I just club things together until they 
work, heh. I had been thinking a warning might be appropriate, so I'll take 
your suggestion as confirmation that that's probably the right path to pursue!

Original comment by kurtmckee on 13 Feb 2012 at 8:58

GoogleCodeExporter commented 9 years ago
This issue was closed by revision r689.

Original comment by kurtmckee on 19 Feb 2012 at 9:50

GoogleCodeExporter commented 9 years ago
Okay, here's how this works (and this applies to both `updated` and 
`updated_parsed` in both `f.feed` and `f.entries[i]`):

If `published` and `updated` keys both exist, they'll function as expected.

If, however, `updated` doesn't exist but `published` does, membership tests 
like `'updated' in f.feed` will return False. Then, if the `updated` key is 
deliberately accessed, the value of `published` will be returned. Additionally, 
the warnings module will throw a DeprecationWarning which won't by default 
trigger an exception nor a traceback. This behavior can be modified by calling 
some of the methods in the warnings module or by starting the Python 
interpreter with the -W switch, but under normal circumstances the 
DeprecationWarning will only be printed once and won't trigger an exception.

Although I updated the documentation and added unit tests to help ensure that 
this is behaving well, if I missed something please please please ping me back 
about it. It's my intention to release feedparser 5.1.1 to accommodate several 
Linux distribution maintainers, but I want to make sure this is done correctly. 
:)

Thanks for reporting this, Thomas!

Original comment by kurtmckee on 19 Feb 2012 at 10:11