API for working with XML data isn't very intuitive

edsu / pymarc

process MARC records from Python

http://python.org/pypi/pymarc

Other

252 stars 98 forks source link

API for working with XML data isn't very intuitive #73

Open danmichaelo opened 9 years ago

danmichaelo commented 9 years ago

Having some xml data,

data = open('test.xml', 'rb')

, I expected from the README example to be able to do something like

from pymarc import MARCReader
for record in MARCReader(data):
    ...

but instead I had to do

import pymarc
for record in pymarc.parse_xml_to_array(data):
    ...

Determining the file type should be quite easy from reading the first characters of the file stream: xml if "<?xml", json if "{", plain marc otherwise.

Next I wanted to try to serialize a record to XML. The Record object has methods like as_marc(), as_marc21() and hm, even as_json(), but no as_xml()! Instead:

pymarc.record_to_xml(record)

edsu commented 9 years ago

Yeah, that's a fair criticism. Still, you figured it out -- so maybe it's not so bad? Or maybe most people give up before you? I guess I secretly loathe XML, and like keeping it in a corner.

danmichaelo commented 9 years ago

We all do ;) But then there's library systems…

Anyways, I won't be offended if you close this as "wontfix", but consider adding a short xml example to the README first. Might help save others some time.

edsu commented 9 years ago

I'll leave this open until one of those things happen. Thanks @danmichaelo !

rlskoeser commented 8 years ago

Seconding that reading xml is not intuitive. Had to look at the test code and this issue before I got it to work.

edsu commented 5 years ago

Just out of curiosity would people be ok with:

for record in MARCReader(open('batch.xml', 'rb')):
    # do something useful with a Record

creating an in memory array for all the records, and then allowing the iteration to start? I think it would be preferable if it did function as an iterator, but I'm not quite sure how that would combine with the SAX parsing that's going on. I like that MARCReader is an actual iterator, and allows you to process large files. I think this another subconscious reason why I partitioned the XML functionality off to the side.