HaveF / feedparser

Automatically exported from code.google.com/p/feedparser
Other
0 stars 0 forks source link

itunes:keywords should be split by commas, not whitespace #309

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Parse an itunes feed
2. Inspect the feed.tags element
3. The feed.tags[i].term elements will have been split apart based on word 
rather than comma

What is the expected output? What do you see instead?

Example feed: http://www.apple.com/podcasts/apple_keynotes/apple_keynotes.xml

Here is the raw itunes:keywords element before parsing:
<itunes:keywords>keynote, mac, macintosh, osx, appletv, steven, ipod, iphone, 
presentation, sdk, xcode, ipad</itunes:keywords>

And here is what feed parser produces:
            'tags': [ { 'label': None,
                        'scheme': u'http://www.itunes.com/',
                        'term': u'keynote,'},
                      { 'label': None,
                        'scheme': u'http://www.itunes.com/',
                        'term': u'mac,'},
                      { 'label': None,
                        'scheme': u'http://www.itunes.com/',
                        'term': u'macintosh,'},
                      { 'label': None,
                        'scheme': u'http://www.itunes.com/',
                        'term': u'osx,'},
                      { 'label': None,
                        'scheme': u'http://www.itunes.com/',
                        'term': u'appletv,'},
                      { 'label': None,
                        'scheme': u'http://www.itunes.com/',
                        'term': u'steven,'},
                      { 'label': None,
                        'scheme': u'http://www.itunes.com/',
                        'term': u'ipod,'},
                      { 'label': None,
                        'scheme': u'http://www.itunes.com/',
                        'term': u'iphone,'},
                      { 'label': None,
                        'scheme': u'http://www.itunes.com/',
                        'term': u'presentation,'},
                      { 'label': None,
                        'scheme': u'http://www.itunes.com/',
                        'term': u'sdk,'},
                      { 'label': None,
                        'scheme': u'http://www.itunes.com/',
                        'term': u'xcode,'},
                      { 'label': None,
                        'scheme': u'http://www.itunes.com/',
                        'term': u'ipad'},
                      { 'label': None, 'scheme': None, 'term': u'Technology'},
                      { 'label': None,
                        'scheme': u'http://www.itunes.com/',
                        'term': u'Technology'},
                      { 'label': None,
                        'scheme': u'http://www.itunes.com/',
                        'term': u'Tech News'}]

What version of the product are you using? On what operating system?
5.01
OSX Lion

Please provide any additional information below.

http://www.apple.com/itunes/podcasts/specs.html#keywords

<itunes:keywords>
This tag allows users to search on a maximum of 12 text keywords. Use commas to 
separate keywords.

Original issue reported on code.google.com by josh.ric...@gmail.com on 2 Dec 2011 at 11:39

GoogleCodeExporter commented 9 years ago
That original example wasn't the greatest.  Here is a slightly better example:

Feed: http://feeds.feedburner.com/37signals_podcast

Raw data:
<itunes:keywords>37signals,podcast,Jason Fried,David Heinemeier Hansson,Matt 
Linderman,web,internet,entrepreneurship,business,design,experience,Rework,Gettin
g Real,Basecamp,Highrise,Backpack,Ruby on Rails</itunes:keywords>

Feed parser:

                 'tags': [ { 'label': None,
                             'scheme': u'http://www.itunes.com/',
                             'term': u'37signals,podcast,Jason'},
                           { 'label': None,
                             'scheme': u'http://www.itunes.com/',
                             'term': u'Fried,David'},
                           { 'label': None,
                             'scheme': u'http://www.itunes.com/',
                             'term': u'Heinemeier'},
                           { 'label': None,
                             'scheme': u'http://www.itunes.com/',
                             'term': u'Hansson,Matt'},
                           { 'label': None,
                             'scheme': u'http://www.itunes.com/',
                             'term': u'Linderman,web,internet,entrepreneurship,business,design,experience,Rework,Getting'},
                           { 'label': None,
                             'scheme': u'http://www.itunes.com/',
                             'term': u'Real,Basecamp,Highrise,Backpack,Ruby'},
                           { 'label': None,
                             'scheme': u'http://www.itunes.com/',
                             'term': u'on'},
                           { 'label': None,
                             'scheme': u'http://www.itunes.com/',
                             'term': u'Rails'}]

Original comment by josh.ric...@gmail.com on 2 Dec 2011 at 7:39

GoogleCodeExporter commented 9 years ago
Fixed in r657.

The unit tests actually indicate that splitting on whitespace was a purposeful 
decision, but the whether it was a developer mistake or a change in the 
specification after the code was written, feedparser is now following the spec 
as it's currently written. Thanks for reporting this!

Original comment by kurtmckee on 3 Dec 2011 at 11:11