Library fails to parse duplicate dc:format elements from Google Books Search

GoogleCodeExporter commented 9 years ago

yakovsh@yakov-desktop:~$ python2.5
Python 2.5.4 (r254:67916, Apr  4 2009, 17:55:16) 
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from gdata.books import Book, BookFeed
>>> from urllib2 import urlopen
>>> data = urlopen("http://www.google.com/books/feeds/volumes?q=football")
>>> xml = data.read()
>>> feed = BookFeed.FromString(xml)
>>> entry = feed.entry[0]
>>> entry.to_dict()
{'embeddability': 'embeddable', 'info':
'http://books.google.com/books?id=W-c9AAAAYAAJ&dq=editions:ISBN1419185551&as_brr
=1&ie=ISO-8859-1&source=gbs_gdata',
'format': 'book', 'identifiers': [('google_id', 'W-c9AAAAYAAJ'),
('HARVARD', 'HWNR1N')], 'thumbnail':
'http://bks0.books.google.com/books?id=W-c9AAAAYAAJ&printsec=frontcover&img=1&zo
om=5&sig=ACfU3U15zaiVsqqg7sh3mtOqD2XwyZBDcw&source=gbs_gdata',
'subjects': ['Drama'], 'authors': ['William Shakespeare'], 'date': '1904',
'title': 'The tragedie of Macbeth', 'preview':
'http://books.google.com/books?id=W-c9AAAAYAAJ&printsec=frontcover&dq=editions:I
SBN1419185551&as_brr=1&ie=ISO-8859-1&source=gbs_gdata',
'viewability': 'view_all_pages', 'annotation':
'http://www.google.com/books/feeds/users/me/volumes'}
>>> xml
"<?xml version='1.0' encoding='UTF-8'?><feed
xmlns='http://www.w3.org/2005/Atom'
xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'
xmlns:gbs='http://schemas.google.com/books/2008'
xmlns:dc='http://purl.org/dc/terms'
xmlns:gd='http://schemas.google.com/g/2005'><id>http://www.google.com/books/feed
s/volumes</id><updated>2009-10-30T01:23:36.000Z</updated><category
scheme='http://schemas.google.com/g/2005#kind'
term='http://schemas.google.com/books/2008#volume'/><title
type='text'>Search results for football</title><link rel='alternate'
type='text/html' href='http://www.google.com'/><link
rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml'
href='http://www.google.com/books/feeds/volumes'/><link rel='self'
type='application/atom+xml'
href='http://www.google.com/books/feeds/volumes?q=football&amp;max-results=1'/><
link
rel='next' type='application/atom+xml'
href='http://www.google.com/books/feeds/volumes?q=football&amp;start-index=2&amp
;max-results=1'/><author><name>Google
Books Search</name><uri>http://www.google.com</uri></author><generator
version='beta'>Google Book Search data
API</generator><openSearch:totalResults>535</openSearch:totalResults><openSearch
:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>1</openSearch:item
sPerPage><entry><id>http://www.google.com/books/feeds/volumes/-vS_ZugKGTcC</id><
updated>2009-10-30T01:23:36.000Z</updated><category
scheme='http://schemas.google.com/g/2005#kind'
term='http://schemas.google.com/books/2008#volume'/><title
type='text'>American football</title><link
rel='http://schemas.google.com/books/2008/thumbnail' type='image/x-unknown'
href='http://bks0.books.google.com/books?id=-vS_ZugKGTcC&amp;printsec=frontcover
&amp;img=1&amp;zoom=5&amp;sig=ACfU3U0wuzK1A_YN8V5u7WEM1910tsfV4w&amp;source=gbs_
gdata'/><link
rel='http://schemas.google.com/books/2008/info' type='text/html'
href='http://books.google.com/books?id=-vS_ZugKGTcC&amp;dq=football&amp;ie=ISO-8
859-1&amp;source=gbs_gdata'/><link
rel='http://schemas.google.com/books/2008/preview' type='text/html'
href='http://books.google.com/books?id=-vS_ZugKGTcC&amp;printsec=frontcover&amp;
dq=football&amp;ie=ISO-8859-1&amp;source=gbs_gdata'/><link
rel='http://schemas.google.com/books/2008/annotation'
type='application/atom+xml'
href='http://www.google.com/books/feeds/users/me/volumes'/><link
rel='alternate' type='text/html'
href='http://books.google.com/books?id=-vS_ZugKGTcC&amp;dq=football&amp;ie=ISO-8
859-1'/><link
rel='self' type='application/atom+xml'
href='http://www.google.com/books/feeds/volumes/-vS_ZugKGTcC'/><gbs:embeddabilit
y
value='http://schemas.google.com/books/2008#embeddable'/><gbs:openAccess
value='http://schemas.google.com/books/2008#enabled'/><gbs:viewability
value='http://schemas.google.com/books/2008#view_all_pages'/><dc:creator>Walter
Camp</dc:creator><dc:date>1891</dc:date><dc:format>175
pages</dc:format><dc:format>book</dc:format><dc:identifier>-vS_ZugKGTcC</dc:iden
tifier><dc:identifier>UOM:39015013352045</dc:identifier><dc:subject>Juvenile
Nonfiction</dc:subject><dc:title>American football</dc:title></entry></feed>"
>>> data =
urlopen("http://www.google.com/books/feeds/volumes/W-c9AAAAYAAJ")>>> xml =
data.read()
>>> xml
"<?xml version='1.0' encoding='UTF-8'?><entry
xmlns='http://www.w3.org/2005/Atom'
xmlns:gbs='http://schemas.google.com/books/2008'
xmlns:dc='http://purl.org/dc/terms'
xmlns:gd='http://schemas.google.com/g/2005'><id>http://www.google.com/books/feed
s/volumes/W-c9AAAAYAAJ</id><updated>2009-10-30T01:24:48.000Z</updated><category
scheme='http://schemas.google.com/g/2005#kind'
term='http://schemas.google.com/books/2008#volume'/><title type='text'>The
tragedie of Macbeth</title><link
rel='http://schemas.google.com/books/2008/thumbnail' type='image/x-unknown'
href='http://bks0.books.google.com/books?id=W-c9AAAAYAAJ&amp;printsec=frontcover
&amp;img=1&amp;zoom=5&amp;sig=ACfU3U15zaiVsqqg7sh3mtOqD2XwyZBDcw&amp;source=gbs_
gdata'/><link
rel='http://schemas.google.com/books/2008/info' type='text/html'
href='http://books.google.com/books?id=W-c9AAAAYAAJ&amp;ie=ISO-8859-1&amp;source
=gbs_gdata'/><link
rel='http://schemas.google.com/books/2008/annotation'
type='application/atom+xml'
href='http://www.google.com/books/feeds/users/me/volumes'/><link
rel='http://schemas.google.com/books/2008/epubdownload'
type='application/epub'
href='http://books.google.com/books/download/The_tragedie_of_Macbeth.epub?id=W-c
9AAAAYAAJ&amp;ie=ISO-8859-1&amp;output=epub&amp;source=gbs_gdata'/><link
rel='alternate' type='text/html'
href='http://books.google.com/books?id=W-c9AAAAYAAJ&amp;ie=ISO-8859-1'/><link
rel='self' type='application/atom+xml'
href='http://www.google.com/books/feeds/volumes/W-c9AAAAYAAJ'/><gbs:embeddabilit
y
value='http://schemas.google.com/books/2008#embeddable'/><gbs:openAccess
value='http://schemas.google.com/books/2008#enabled'/><gbs:viewability
value='http://schemas.google.com/books/2008#view_all_pages'/><dc:creator>William
Shakespeare</dc:creator><dc:date>1904</dc:date><dc:format>340
pages</dc:format><dc:format>book</dc:format><dc:identifier>W-c9AAAAYAAJ</dc:iden
tifier><dc:language>en</dc:language><dc:publisher>T.Y.
Crowell &amp; co.</dc:publisher><dc:subject>Drama /
General</dc:subject><dc:subject>Drama / English, Irish, Scottish,
Welsh</dc:subject><dc:subject>Drama /
Shakespeare</dc:subject><dc:subject>Literary Criticism /
Shakespeare</dc:subject><dc:title>The tragedie of Macbeth</dc:title></entry>"
>>> book = Book.FromString(xml)
>>> book.to_dict()
{'embeddability': 'embeddable', 'info':
'http://books.google.com/books?id=W-c9AAAAYAAJ&ie=ISO-8859-1&source=gbs_gdata',
'format': 'book', 'publishers': ['T.Y. Crowell & co.'], 'identifiers':
[('google_id', 'W-c9AAAAYAAJ')], 'thumbnail':
'http://bks0.books.google.com/books?id=W-c9AAAAYAAJ&printsec=frontcover&img=1&zo
om=5&sig=ACfU3U15zaiVsqqg7sh3mtOqD2XwyZBDcw&source=gbs_gdata',
'subjects': ['Drama / General', 'Drama / English, Irish, Scottish, Welsh',
'Drama / Shakespeare', 'Literary Criticism / Shakespeare'], 'authors':
['William Shakespeare'], 'date': '1904', 'title': 'The tragedie of
Macbeth', 'viewability': 'view_all_pages', 'annotation':
'http://www.google.com/books/feeds/users/me/volumes'}
>>>

Original issue reported on code.google.com by google....@shaftek.org on 30 Oct 2009 at 1:25

GoogleCodeExporter commented 9 years ago

As best as I can tell, Google is not following their specification here. The 
API docs
list when they might return multiple of an element, and they do not indicate 
this for
dc:format. Further, it says it should only (at this time) indicate the number of
pages. So, this is yet another instance of the Google Book Search API 
implementation
and docs disagreeing.

Original comment by sams.ja...@gmail.com on 9 Dec 2009 at 12:33

GoogleCodeExporter commented 9 years ago

You could try parsing the XML with VolumeEntry in gdata.books.data which was 
recently added

http://code.google.com/p/gdata-python-client/source/browse/trunk/src/gdata/books
/
data.py

The VolumeEntry.format member is specified to allow multiple 
gdata.dublincore.data.Format objects.

Original comment by jscud.w...@gmail.com on 9 Dec 2009 at 12:51

GoogleCodeExporter commented 9 years ago

Excuse my newbness, but how do you use the VolumeEntry class? I can't find a 
method
that would do the parsing.

Original comment by smarth...@gmail.com on 24 Feb 2010 at 11:14

GoogleCodeExporter commented 9 years ago

To parse an XML string into a VolumeEntry class (or any other subclass of 
atom.core.XmlElement) you can do:

volume_entry = atom.core.parse(xml_string, gdata.books.data.VolumeEntry)

Original comment by jscud.w...@gmail.com on 24 Feb 2010 at 6:29

GoogleCodeExporter commented 9 years ago

Okay, so using VolumeEntry doesn't fix this problem. When I try to use it I 
don't get
any format objects. Should I file a separate bug for this?

>>> data = 
urlopen("http://www.google.com/books/feeds/volumes/W-c9AAAAYAAJ").read()
>>> book = atom.core.parse(data, VolumeEntry)
>>> book.title
<atom.data.Title object at 0x288d990>
>>> book.title.text
'The tragedie of Macbeth'
>>> book.format
[]
>>>

Original comment by smarth...@gmail.com on 25 Feb 2010 at 9:07

GoogleCodeExporter commented 9 years ago

Original comment by afs...@google.com on 7 Oct 2011 at 11:37

Added labels: Component-Books

Nejuf / gdata-python-client

Library fails to parse duplicate dc:format elements from Google Books Search #301