edsu / pymarc

process MARC records from Python
http://python.org/pypi/pymarc
Other
253 stars 98 forks source link

MARC-8 mapping (Eszett, Euro Sign, and some revisions) #84

Closed gugek closed 8 years ago

gugek commented 8 years ago

I came across some records in the wild which had the eszett in them and noted that the existing marc8_mapping.py doesn't have a mapping for that character (UTF-8: U+00DF).

It looks like the LC Code Tables for MARC-8 mappings were updated in 2004: see https://memory.loc.gov/diglib/codetables/45.html which might explain how the character (and the Euro symbol) are overlooked.

I can provide an updated file in a pull request.

But there are a a couple of other changes listed that aren't reflected in the mapping:

See:

Revised June 2004 to add the Eszett (M+C7) and the Euro Sign (M+C8) to the
MARC-8 set.

Revised September 2004 to change the mapping from MARC-8 to Unicode for
the Ligature (M+EB and M+EC) from U+FE20 and U+FE21 to U+0361.

Revised September 2004 to change the mapping from MARC-8 to Unicode for
the Double Tilde (M+FA and M+FB) from U+FE22 and U+FE23 to U+0360.

Revised March 2005 to change the mapping from MARC-8 to Unicode for the
Alif (M+2E) from U+02BE to U+02BC.

So the question is how to handle the revised mappings? Just do the right thing right now? Keep doing the old behavior? Its easy enough with the new characters but the changes might be problematic for some?

edsu commented 8 years ago

Hi @gugek, if you can send a pull request for these changes I will merge them. The best we can do is try to do the right thing now I think.

edsu commented 8 years ago

Fixed in f0faf74e36509e326b217d0cf20e334768d6d009