LeoHsiao1 / pyexiv2

Read and write image metadata, including EXIF, IPTC, XMP, ICC Profile.
GNU General Public License v3.0
197 stars 39 forks source link

Xmp LangAlt handling #85

Closed jim-easterbrook closed 2 years ago

jim-easterbrook commented 2 years ago

At present Xmp data with 'LangAlt' type is presented as a string with the languages included, such as lang="x-default" Hello, world, lang="de-DE" Hallo, Welt. This is not very easy to use, and easy to get wrong when setting a value. Would it be better to use a dict for this, e.g. {'x-default': 'Hello, world', 'de-DE': 'Hallo, Welt'}.

LeoHsiao1 commented 2 years ago

I'm not familiar with the format of most metadata, so try to treat it as a normal string. You have a lot more experience than I do. Do all image producers use this LangAlt format? I'm worried that it won't convert from string to dict.

jim-easterbrook commented 2 years ago

The format is set by the Xmp specification, for example Xmp.dc.description has type LangAlt. https://www.exiv2.org/tags-xmp-dc.html

I think converting everything to a string is oversimplifying the data. Lists of values (such as Iptc.Application2.Keywords) shouldn't be reduced to a single string, as individual values in the list might contain the ", " string you use as a separator. Some other values, such as Exif.Canon.ModelID have a value that is just a number, but Exiv2::Metadatum provides methods to convert it to an "interpreted string" such as "EOS Rebel SL1 / 100D / Kiss X7".

Simply using a string for everything is nice and simple, and for many applications is probably the best solution. A lower level interface would give more control, but would be a lot harder to use.

LeoHsiao1 commented 2 years ago

Well, I'll add a function when reading metadata that tries to convert the metadata from a string type to some other, more convenient Python type. If the conversion fails, the value is returned as a string.

LeoHsiao1 commented 2 years ago

XMP tags of type LangAlt are now converted to dict. For example:

>>> import pyexiv2
>>> img = pyexiv2.Image(r'./pyexiv2/tests/data/1.jpg')
>>> img.read_xmp()['Xmp.dc.title']
{'lang="x-default"': 'test-中文-', 'lang="de-DE"': 'Hallo, Welt'}

Keys are named like lang="x-default" instead of x-default to highlight their purpose. This feature will be added to the next release.

jim-easterbrook commented 2 years ago

I like that. If you set a LangAlt tag from a plain string (instead of a dict), is the "lang="x-default" part added automatically?

LeoHsiao1 commented 2 years ago

pyexiv2 does not set a default language, but exiv2 seems to do so:

>>> img.modify_xmp({'Xmp.dc.title': ''})
>>> img.read_xmp()['Xmp.dc.title']
{'lang="x-default"': ''}
>>> img.modify_xmp({'Xmp.dc.title': 'Hello'})
>>> img.read_xmp()['Xmp.dc.title']
{'lang="x-default"': 'Hello'}
LeoHsiao1 commented 2 years ago

I have released v2.7.0 to GitHub and pypi.org .