edsu / pymarc

process MARC records from Python
http://python.org/pypi/pymarc
Other
251 stars 99 forks source link

YAZ - collecting data and printing them with PYMARC #115

Open zurek11 opened 6 years ago

zurek11 commented 6 years ago

Hello. I have simple data collected from YAZ commands.

yaz-client -m catalogue.dat

I am connecting to library which has MARC21 format and UTF-8 encoding. I am saving records to catalogue.dat file. It's CZECH library so titles are with special characters for example Ř or Ě etc. when i will run this code:

def get_books(request):
    with open('catalogue.dat', 'rb') as fh:
        reader = MARCReader(fh)
        for record in reader:
            print(str(record.title()))
    return HttpResponseRedirect('/')

Console will print this:

couldn't find 0xbe in g0=66 g1=69
Zelen©Ł kniha /
couldn't find 0xbe in g0=66 g1=69
Kniha p¿©Łtel /
Kniha ¿©Ưkadel /
Kniha poezie /
Kniha dn©Ư /
Kniha ¿©Ưkadel /
Kniha definic /
Kniha cest /
Kniha Frenesis /
Smoln©Ł kniha /
couldn't find 0xbe in g0=66 g1=69
couldn't find 0xbe in g0=66 g1=69
couldn't find 0xbe in g0=66 g1=69
couldn't find 0xaf in g0=66 g1=69

So basicly there are two issues. First why it prints couldn't find errors and why it prints data without that special characters? Thank you so much.

josephalway commented 5 years ago

I believe it defaults to marc8 encoding, try changing your with open line to: with open('catalogue.dat', to_unicode=True, force_utf8=True, 'rb') as fh:

From the MARCReader class docstring in the marc8.py file:

If you find yourself in the unfortunate position of having data that is utf-8 encoded without the leader set appropriately you can use the force_utf8 parameter:

reader = MARCReader(file('file.dat'), to_unicode=True,
    force_utf8=True)

Not sure, if that's the particular problem you're having, but that might help. Though, you might need to remove the to_unicode=True portion that I recommended.