Closed GoogleCodeExporter closed 9 years ago
I'm attaching a sample document that illustrates the lax handling of
itunes:summary and description. The problem you're seeing stems from
description and itunes:summary being handled differently based on which appears
first in the document, and the sample I've attached illustrates that.
By the way, 'description' is an alias key; it doesn't actually exist. It first
checks if there's a 'subtitle', and if not, it falls back on 'summary'. I
recommend relying on 'subtitle' and 'summary' so that you know what you're
going to get (...on the assumption that the order of the elements in the
document doesn't affect the result!).
Original comment by kurtmckee
on 9 Dec 2010 at 6:16
Attachments:
After reviewing the RSS 2.0 and Atom specifications, the iTunes spec, and the
code, I've created and attached a patch that fixes the behavior I noted in the
above comment. It will not, however, stop the `itunes:summary` from being
loaded into the `summary` key (for which the `description` key is an alias). I
consider this correct behavior if the publisher is purposefully adding elements
from the itunes namespace.
Currently, when the code encounters a `description` or `summary` element it
checks if there's already a `summary` key. If there is, it puts the data in the
`content` key. This is obviously a purposeful design decision, and there are
four unit tests that check this behavior. `itunes:summary` elements are treated
the same as `summary` elements in the code, which is why they're affected by
this design decision as well.
There are three options available: (1) create a dedicated method to deal with
`itunes:summary` elements to guarantee they only appear in the `summary` key of
the result dictionary, (2) remove the two function definition lines so
`itunes:summary` is stored in the `itunes_summary` key, or (3) remove the
data-shifting behavior. I've opted for the third based on the specs'
description of the elements:
`description`: "The item synopsis." [1]
`summary`: "Conveys a short summary, abstract, or excerpt of the entry." [2]
`itunes:summary`: "This field can be up to 4000 characters. If <itunes:summary>
is not included, the contents of the <description> tag are used." [3]
It seems to me that all three should be treated in the same manner and placed
in the `summary` key, which is what the patch I'm attaching does. Here's the
list of unit tests that can be removed with this patch:
illformed/rss/item_description_and_summary.xml
illformed/rss/item_summary_and_description.xml
wellformed/rss/item_description_and_summary.xml
wellformed/rss/item_summary_and_description.xml
Tested in Python 2.4 through 3.1, git branch at:
https://github.com/kurtmckee/feedparser/tree/issue242
[1]: http://cyber.law.harvard.edu/rss/rss.html#hrelementsOfLtitemgt
[2]: http://www.atomenabled.org/developers/syndication/#recommendedEntryElements
[3]: http://www.apple.com/itunes/podcasts/specs.html#summary
Original comment by kurtmckee
on 3 Jan 2011 at 8:50
Attachments:
@Ade: I don't know if I expressed my opinion above, but I think that putting
the `itunes:summary` element in the `summary` key is correct behavior here. If
you agree, then aside from considering the patch above, this report can be
closed as wontfix.
@Carlos: If this report gets closed as wontfix, don't despair! After the next
release, I'm going to create an experimental git branch based on a blog entry I
wrote [1]. The changes would allow you to customize how the `itunes:summary`
element is handled. It would only be experimental, but it may be helpful to you.
[1]: http://kurtmckee.livejournal.com/32124.html
Original comment by kurtmckee
on 3 Jan 2011 at 8:59
I believe the code is doing the right thing here so I'm marking this as WontFix.
I think the data Carlos wants is available from: feed.entries[0].summary and
that seems like a simpler solution.
Original comment by adewale
on 4 Jan 2011 at 3:34
Original issue reported on code.google.com by
carlos.m...@gmail.com
on 9 Dec 2010 at 11:48