LeoHsiao1 / pyexiv2

Read and write image metadata, including EXIF, IPTC, XMP, ICC Profile.
GNU General Public License v3.0
197 stars 39 forks source link

Copying data to new image, the xmp size in bytes is different #79

Closed justlike-prog closed 2 years ago

justlike-prog commented 2 years ago

Hi,

I tried to create a new image and copy the metadata from this image to one I created using PIL. For some reason when checking the bytes of the xmp-profile with the identify tool by ImageMagiC I get a different size although the value of the tags seem to be the same.

Do you have some explenation for why this happens?

LeoHsiao1 commented 2 years ago

Hi! pyexiv2 does not support modifying a few metadata. So you may have lost some metadata when you copied it. You can compare the original image with the new image and find out which metadata is different.

xmp1 = img1.read_xmp()
xmp2 = img2.read_xmp()
assert xmp1 == xmp2

On another way, you can directly copy the original XMP data using the following method:

data = img1.read_raw_xmp()
img2.modify_raw_xmp(data)
justlike-prog commented 2 years ago

Seems that I get the same result unfortunately with the second method you mentioned.

LeoHsiao1 commented 2 years ago

Can you find the XMP difference between the two images?

xmp1 = img1.read_raw_xmp()
xmp2 = img2.read_raw_xmp()
assert xmp1 == xmp2
justlike-prog commented 2 years ago

Seems to be the same, but it makes sense, since its the same API that reads the XMP so if it is faulty with one file it will be faulty with the other.

Maybe reading out all the xmp information and dumping it somehow directly into the metadata of the other file would be a solution.

LeoHsiao1 commented 2 years ago

The XMP data is stored as XML text in the image.

The EXIF and IPTC data are stored as binary data in the image, and the data format must be changed when writing, so errors may occur. In summary, write_raw_xmp() should write XMP data with no differences and no errors.

But, exiv2 automatically adds a comment x:xmptk to the XML header of XMP data:

<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 4.4.0-Exiv2"> 

Could it be the cause of your problems?

You can download another command-line tool, exifTool, to view the image metadata. For example:

$ ./exiftool -v 1.jpg
....
JPEG APP1 (4664 bytes):
  + [XMP directory, 4635 bytes]
  | XMPToolkit = XMP Core 4.4.0-Exiv2
  | Format = image/jpeg
  | Rating = 4
...
justlike-prog commented 2 years ago

Hi! Thanks for thorough answer. I create the new image in the following way:

new_image = PIL.Image.new(mode="RGB", size=(200, 200)) new_image.save("new.jpg")

I copy the data how you described above. Now I checked with the exifTool and these are my logs that I get.

This is the output of the old image. This is the output of the new image that got the copied data.

It looks like

`JPEG APP1 (28430 bytes):

vs.

`JPEG APP1 (19687 bytes):

which is quite a difference.

Just from counting the lines it seems that there are 4 entries missing.

LeoHsiao1 commented 2 years ago

Thank you for the debugging. I noticed the following differences in the XMP data for the two images:

- XMPToolkit = Image::ExifTool 10.96
+ XMPToolkit = Exempi + XMP Core 5.6.0

- Creator = Carl Seibert (XMP)
+ Creator = Carl Seibert (IIM)

- CreateDate = 2017-05-29T17:19:21-04:00
+ CreateDate = 2017-05-29T11:11:16

There appears to be another tool that modifies the content of the metadata.

In addition, the following metadata is lost:

GPSAltitude = 0/10
GPSLatitude = 26,34.951N
GPSLongitude = 80,12.014W
GPSAltitudeRef = 0

Is it convenient to send this original image to my email for testing?

justlike-prog commented 2 years ago

Thanks, sure I can, but it is the one I linked when I opened the issue.

LeoHsiao1 commented 2 years ago

I tried to read the structure of the original image:

$ exiftool -v 1.jpg
JPEG APP1 (28430 bytes):
  + [XMP directory, 28401 bytes]
  | XMPToolkit = Image::ExifTool 10.96
  | CountryCode = USA
...

Then I copy XMP data with pyexiv2:

import pyexiv2

with pyexiv2.Image(r'1.jpg') as img:
    raw1 = img.read_raw_xmp()

with pyexiv2.Image(r'2.jpg') as img:
    img.clear_xmp()
    img.modify_raw_xmp(raw1)
    raw2 = img.read_raw_xmp()

assert raw1 == raw2

In the result, raw1 is equal to raw2. But the length of the copy does get smaller:

$ exiftool -v 2.jpg
JPEG APP1 (19843 bytes):
  + [XMP directory, 19814 bytes]
  | XMPToolkit = XMP Core 4.4.0-Exiv2
  | Warning = [minor] Fixed incorrect URI for xmlns:MicrosoftPhoto
  | CountryCode = USA
...

However, I noticed that exiv2 did not lose XMP metadata when copying, but lost two comments:

  | [adding XMP-iptcExt:ArtworkOrObjectAOCircaDateCreated]
  | [adding XMP-iptcCore:CreatorContactInfoCiAdrCity]

In general, the reason is that exiv2 does not read these two comments. I'm not sure if they belong to the XMP standard, as they are not documented in exiv2.

justlike-prog commented 2 years ago

Ok, I see. So there is probably nothing to be done about this. The image from Wikipedia might be just an outlier. Thank you! Wouldn't it be theoretically possible to read the xmp data like here and then dump it somehow into the metadata of the other image?

LeoHsiao1 commented 2 years ago

It's possible. Reading XMP is simpler than reading EXIF and IPTC. You can try other Python libraries. But very few Python libraries support writing metadata.

justlike-prog commented 2 years ago

I might have come up with a workaround. I cout out the xmp data from one file and basically write it into the other in place of the originally copied data.


from libxmp import XMPFiles, consts
from libxmp.utils import file_to_dict

new_image = PIL.Image.new(mode="RGB", size=(200, 200))
new_image.save("new.jpg")

with open('unnamed.jpg', 'r+b') as file_1:
    o_img = file_1.read()

start = bytes('<x:xmpmeta', 'utf-8')
xmp_start = o_img.find(start)

end = bytes('</x:xmpmeta', 'utf-8')
xmp_end = o_img.find(end)

xmp_str = o_img[xmp_start:xmp_end + 12]

#########################################################

# workaround - just put the data in here so the xmpmeta tag exists   

xmpfile = XMPFiles(file_path="unnamed.jpg", open_forupdate=True)
xmpfile2 = XMPFiles(file_path="new.jpg", open_forupdate=True)

xmp = xmpfile.get_xmp()
xmpfile2.put_xmp(xmp)
xmpfile2.can_put_xmp(xmp)

xmpfile.close_file()
xmpfile2.close_file()

##########################################################

with open('new.jpg', 'r+b') as file_2:

    n_img = file_2.read()

    start_2 = bytes('<x:xmpmeta', 'utf-8')

    xmp_start_2 = n_img.find(start)
    xmp_end_2 = n_img.find(end)

    xmp_part_1 = n_img[0:xmp_start_2].strip()
    xmp_part_2 = n_img[xmp_end_2 + 12:].strip()

    new_str = xmp_part_1 + xmp_str + xmp_part_2

    file_2.seek(0)
    file_2.truncate()
    file_2.write(new_str)
    file_2.close()

Although now I get

JPEG APP1 (19687 bytes):

  • [XMP directory, 19658 bytes] | Warning = XMP format error (no closing tag for x:xmpmeta) | Warning = [minor] Empty XMP

although the tag is actually closed. Any idea?

justlike-prog commented 2 years ago

Actually it makes sense, since it seems that both ImageMagic and exiftool can't read those tags.

LeoHsiao1 commented 2 years ago

I guess you also need to declare the length of the XMP directory in the image according to the XMP standard. There may be other binary information that needs to be changed, just as strictly as writing a TCP packet. I've always thought of XMP as a simple piece of text, in XML format, that is read without processing. Now it turns out I was wrong, because I didn't read the XMP standard.