Closed dgtlmoon closed 3 years ago
These errors are due to the failure of the Test case test_read_raw_xmp()
. The raw XMP length read is different than expected.
I need to find out what's different. Please manually read the raw XMP of the test image and copy it to this issue, including the space character.
$ cd pyexiv2/
$ ls pyexiv2/tests/1.jpg # The test image is here
pyexiv2/tests/1.jpg
$ python3
Python 3.6.9 (default, Nov 7 2019, 10:44:02)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyexiv2
>>> img = pyexiv2.Image('pyexiv2/tests/1.jpg')
>>> img.read_raw_xmp().encode()
b'<?xpacket begin="\xef\xbb\xbf" id="W5M0MpCehiHzreSzNTczkc9d"?> ...
Output attached.. lots of whitespace in the end (approximately 2050bytes of 0x20h) output.txt
I found that even the XMP ID was not what I expected. Please check if your test images have been modified. Whether the hash value is consistent with this:
$ md5sum pyexiv2/tests/1.jpg
81760130b2ed488646c2bf9fd485cb03 1.jpg
After test..
dgtlmoon@acer:/tmp/x/pyexiv2$ md5sum pyexiv2/tests/1.jpg
81760130b2ed488646c2bf9fd485cb03 pyexiv2/tests/1.jpg
After build
dgtlmoon@acer:/tmp/x/pyexiv2$ git status .
On branch master
Your branch is up to date with 'origin/master'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: pyexiv2/lib/py36-linux/exiv2api.so
repo reset..
dgtlmoon@acer:/tmp/x/pyexiv2$ git reset --hard
HEAD is now at 3b18062 Update the defect description
dgtlmoon@acer:/tmp/x/pyexiv2$ md5sum pyexiv2/tests/1.jpg
81760130b2ed488646c2bf9fd485cb03 pyexiv2/tests/1.jpg
dgtlmoon@acer:/tmp/x/pyexiv2$ ls -al pyexiv2/tests/1.jpg
-rw-rw-r-- 1 dgtlmoon dgtlmoon 18031 Dec 27 13:03 pyexiv2/tests/1.jpg
I just ran it a second time.. and i got a pass
After test..
dgtlmoon@acer:/tmp/x/pyexiv2$ md5sum pyexiv2/tests/1.jpg
81760130b2ed488646c2bf9fd485cb03 pyexiv2/tests/1.jpg
So on checkout master
build and test it fails.. but sometimes on the second time it will pass
The test case has been executed on multiple platforms:
matrix:
os: [ubuntu-18.04, macOS-latest, windows-2019]
python-version: [3.5, 3.6, 3.7, 3.8, 3.9]
I haven't had this problem before. In a similar situation, Windows handles space characters differently than Linux.
Well.. it's still important to figure out why I think...
Can you explain this comment "Windows handles space characters differently than Linux."
What exactly, how exactly does this affect the tests? I'm not a mind reader :)
For example, when Exiv2 outputs image metadata to a terminal on Windows, it shows the newline character as \r\n
and the path separator as \
.
However, pyexiv2 does not output data to the terminal, so it has not encountered this problem.
The probability that Windows is related to your issue is too low to go into detail.
@LeoHsiao1 So what you are saying, if i understand your assumption correctly, is that you've written the tests in a way that they will only pass on windows? that's a horrible idea!
@check_md5
def test_read_raw_xmp():
with Image(path) as img:
assert len(img.read_raw_xmp()) == 4593
so that 4593 will only work in windows ?!?!
No, my test cases have been executed under 15 circumstances: https://github.com/LeoHsiao1/pyexiv2/actions/runs/448497877
matrix:
os: [ubuntu-18.04, macOS-latest, windows-2019]
python-version: [3.5, 3.6, 3.7, 3.8, 3.9]
As you can see from the GitHub Actions page, most of the recent test results have passed.
According to the specifications, the whitespace padding is part of the XMP standard... so any change here means that something has changed and we need to know exactly what..
to set an arbitrary integer for comparison if your test passed without explanation or the original information on how you got that number is really not good enough for other programmers...
https://archimedespalimpsest.net/Documents/External/XMP/XMPSpecificationPart3.pdf
Can you paste in a hexdump of what you are seeing? I'm using..
@check_md5
def test_read_raw_xmp():
with Image(path) as img:
x= ":".join("{:02x}".format(ord(c)) for c in img.read_raw_xmp())
print( x)
assert len(img.read_raw_xmp()) == 4593
Also... maybe the libexiv is different? which version are you running under Ubuntu 18?
$ dpkg -l|grep -i libexiv
ii libexiv2-14:amd64 0.25-3.1ubuntu0.18.04.5 amd64 EXIF/IPTC/XMP metadata manipulation library
ii libexiv2-dev 0.25-3.1ubuntu0.18.04.5 amd64 EXIF/IPTC/XMP metadata manipulation library - development files
NOTE 3 When XMP is embedded within digital files, including white-space padding is sometimes helpful. Doing sofacilitates modification of the XMP packet in-place. The rest of the file is unaffected, which could eliminate a need to rewritethe entire file if the XMP changes in size. Appropriate padding is SPACE characters placed anywhere white space isallowed by the general XML syntax and XMP serialization rules, with a linefeed (U+000A) every 100 characters or so toimprove human display. The amount of padding is workflow-dependent; around 2000 bytes is often a reasonable amount.
So something is wrong... if that padding changes under any libexiv2 version and your library... we should know about it..
Happy New Year! I uploaded a file containing the expected RAW XMP. raw_xmp.txt
Do you still have test_read_raw_xmp()
fail?
The current version of Exiv2 I am using is 0.27.2
.
My output in the text file..
@check_md5
def test_read_raw_xmp():
with Image(path) as img:
f = open("2.5-output.txt", "w")
f.write(img.read_raw_xmp())
f.close()
assert len(img.read_raw_xmp()) == 4593
So here's what I have..
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.6-c128 79.159124, 2016/03/18-14:01:55 ">
And what you have.....
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 4.4.0-Exiv2">
Checking the result based on an integer of the total length is a horrible idea.. better is to hardcode what values in the XML (parse the XML in Python, it is easy) and check for the strings..
Here is the output of what I'm seeing.. hope it helps..
Here's the story.. 4593 is written in your test, 4635 is what I get...
>>> 4635- 4593
42
>>> s='<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.6-c128 79.159124, 2016/03/18-14:01:55 ">'
>>> len(s)
109
>>>
>>> s='<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 4.4.0-Exiv2">'
>>> len(s)
67
>>> 109-67
42
The actual value stored in the image is:
x:xmptk="Adobe XMP Core 5.6-c128 79.159124, 2016/03/18-14:01:55 "
But after Exiv2 parses it, it returns:
x:xmptk="XMP Core 4.4.0-Exiv2"
Maybe you're using a different version of Exiv2.
In summary, test_read_xmp()
is used to check that the content of XMP are correct. test_read_raw_xmp()
is just used to check that raw XMP can be read out, not to check the content.
@LeoHsiao1 You seem to be just repeating exactly what I'm writing :'(
Well, on the one hand, I'll update the test case so that it can locate the differences.
On the other hand, your test case test_read_raw_xmp()
didn't pass, is it because you're using version 0.25 of Exiv2 ?
That's right... that's what I said :) but... how do we know it's because of 0.25 and not 0.26.4 etc?
I need to find out where "Adobe XMP Core 5.6-c128 79.159124" is coming from exactly in my system, because I'm on Ubuntu 18 also.. @LeoHsiao1 i'm on ubuntu buster (18.04)
$ cat /etc/debian_version
buster/sid
dgtlmoon@acer:/tmp/pyexiv2$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.5 LTS
Release: 18.04
Codename: bionic
Test cases always pass on Ubuntu 18.04
. See Github Actions
And my WSL is also Ubuntu 18.04
:
leo@Leo:~$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.2 LTS"
@LeoHsiao1 I know this, you already said that.. that comment does not add much
But I'm running 18.04.5.... i think it is important that this passes (I'm repeating myself too on this point).. otherwise other programmers cannot test/build locally and will waste hours of their life (like I have already) trying to figure out what is wrong
So it works on
DISTRIB_DESCRIPTION="Ubuntu 18.04.2 LTS"
(you)
but not
DISTRIB_DESCRIPTION="Ubuntu 18.04.5 LTS"
(me)
.... not good enough I think..
I dont have "Adobe XMP Core" installed on my machine.. which is what is in the original test image (1.jpg).. And maybe "Adobe XMP Core 5.6-c128 79.159124, 2016/03/18-14:01:55" is not installed on the github actions test machine..
so there for the test is false-positive ?
No, the raw data is stored in the image, regardless of whether you have Adobe installed or not. I just updated the test case, and now it checks if each character of RAW_XMP is different.
"if each character of RAW_XMP is different." why?
Check if each character of RAW_XMP is the same...
Are you trying to ignore problems with whitespace? I dont understand your assumption
read_xmp()
does not give me "XMP Toolkit "
"XMP toolkit" is not available in read_xmp().. that is interesting... it's the one variable that is changing and breaking the tests on 18.0.4.5
@check_md5
def test_read_raw_xmp():
with Image(path) as img:
p=img.read_xmp()
No, test case test_read_raw_xmp()
is more rigorous now, checking if each character of RAW_XMP is the same.
@check_md5
def test_read_raw_xmp():
with Image(test_img) as img:
diff_text(reference_data.RAW_XMP, img.read_raw_xmp())
test_read_raw_xmp
now does not report an error, indicating that the XMP Toolkit
is not a variable.
Yes, OK, like I said, looking for an integer to compare isn't the best solution..
let me know if this in dev
and I'll try it
@LeoHsiao1 Final thing.. I beg you, please be in the habit of including comments in your code.. you will save a lot of programmers in the future a lot of hours of wasted time...
@check_md5
def test_read_raw_xmp():
# We compare only the headers available, XMP headers such as "XMP Toolkit" are not available in the read_raw_xmp()
# information and could affect the output (So we no longer check for an absolute length/integer size)
with Image(test_img) as img:
diff_text(reference_data.RAW_XMP, img.read_raw_xmp())
This test case is very simple. You can understand what it does by the name of the function diff_text
.
This test case is rigorous and checks each character, not just the XMP Toolkit.
@LeoHsiao1 I'm begging you.. it's important to write such a comment... trust me, please... PLEASE... please.... please add this kind of documentation
// Because the length of XMP padding can affect the final output, we compare the actual strings instead of comparing byte lengths
I don't understand your comment.
compare the actual strings
and comparing byte lengths
are not in conflict.
If compare the actual strings
test passes, then comparing byte lengths
test will also pass.
I used to check:
len(excepted_raw_xmp) == len(test_raw_xmp)
Now I'm checking:
excepted_raw_xmp == test_raw_xmp
Instead of ignoring this issue, I made the test cases more rigorous.
If you didn't pass the test case test_read_raw_xmp()
before, you're even less likely to pass it now. But the error message is more specific.
_______________________________________________________________________________ test_read_raw_xmp ________________________________________________________________________________
@check_md5
def test_read_raw_xmp():
with Image(test_img) as img:
> diff_text(reference_data.RAW_XMP, img.read_raw_xmp())
pyexiv2/tests/test_func.py:28:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
text1 = '<?xpacket begin="\ufeff" id="W5M0MpCehiHzreSzNTczkc9d"?> <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 4.4.0-... <?xpacket end="w"?>'
text2 = '<?xpacket begin="\ufeff" id="W5M0MpCehiHzreSzNTczkc9d"?> <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core ... <?xpacket end="w"?>'
def diff_text(text1: (str, bytes), text2: (str, bytes)):
max_len = max(len(text1), len(text2))
for i in range(max_len):
> assert text1[i:i+1] == text2[i:i+1], "The two text is different at index {} :\n< {}\n> {}".format(i, text1[i:i+10], text2[i:i+10])
E AssertionError: The two text is different at index 97 :
E < XMP Core 4
E > Adobe XMP
pyexiv2/tests/base.py:46: AssertionError
I dont understand why the EXIF4 header is being re-written.. I think you have not helped to find out the real cause of the problem
I'm going crazy here, it's like i say something, and then you just repeat what I've already said and we go back to the start :(
When I look at tmp.jpg
I can see that still says "Adobe XMP".. (the original test file 1.jpg
)
but in the test output, it fails... that is the problem
I think the problem is very simple:
If you've been using different versions of EXIV2, how do you find out what's causing the problem?
That's right... that's what I said :) but... how do we know it's because of 0.25 and not 0.26.4 etc?
In your RAW_XMP
you are looking for XMP Core 4.4.0-Exiv2
, but in the YOUR image (.jpg
), you have..
root@acer:/tmp/pyexiv2# exiftool ./pyexiv2/tests/1.jpg |grep Toolki
XMP Toolkit : Adobe XMP Core 5.6-c128 79.159124, 2016/03/18-14:01:55
Proof...
@check_md5
def test_read_raw_xmp():
import subprocess
import time
output = subprocess.check_output("/usr/bin/exiftool {}|grep -i 'xmp toolkit'".format(test_img), shell=True)
print("Looking at {}, I see {} ---->".format(test_img, output))
time.sleep(1) # so we dont fork to background and possibily overwrite it
with Image(test_img) as img:
diff_text(reference_data.RAW_XMP, img.read_raw_xmp())
text1 = '<?xpacket begin="\ufeff" id="W5M0MpCehiHzreSzNTczkc9d"?> <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 4.4.0-... <?xpacket end="w"?>'
text2 = '<?xpacket begin="\ufeff" id="W5M0MpCehiHzreSzNTczkc9d"?> <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core ... <?xpacket end="w"?>'
def diff_text(text1: (str, bytes), text2: (str, bytes)):
max_len = max(len(text1), len(text2))
for i in range(max_len):
> assert text1[i:i+1] == text2[i:i+1], "The two text is different at index {} :\n< {}\n> {}".format(i, text1[i:i+10], text2[i:i+10])
E AssertionError: The two text is different at index 97 :
E < XMP Core 4
E > Adobe XMP
pyexiv2/tests/base.py:46: AssertionError
------------------------------------------------------------------------------ Captured stdout call ------------------------------------------------------------------------------
Looking at /tmp/pyexiv2/pyexiv2/tests/test.jpg, I see b'XMP Toolkit : Adobe XMP Core 5.6-c128 79.159124, 2016/03/18-14:01:55\n' ---->
This test will only pass if something horrible is wrong
I found that the MD5 value of test image was not changed, but Exiv2 returned raw XMP incorrectly.
OK.. getting somewhere... but I think your test is weird (https://github.com/LeoHsiao1/pyexiv2/issues/41#issuecomment-753612040)
Please change RAW_XMP
in reference_data.py
to be the same as what is in 1.jpg
$ strings ./pyexiv2/tests/1.jpg |grep Adobe
" id="W5M0MpCehiHzreSzNTczkc9d"?> <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.6-c128 79.159124, 2016/03/18-14:01:55 ">
It is wrong and should not beXMP Core 4.4.0-Exiv2
You will need to include the whitespace padding in that change btw..
Python3.6 on ubuntu here