Ymagis / ClairMeta

Clairmeta is a python package for Digital Cinema Package (DCP) probing and checking.
BSD 3-Clause "New" or "Revised" License
77 stars 21 forks source link

Check Hash DCP charset UTF-8 problem #248

Open JeromeLesaint opened 2 weeks ago

JeromeLesaint commented 2 weeks ago

hi,

Sometimes, hash of assets aren't correct because in calculation of hash you use a decode("utf-8") at the end (file.py: l 208)

We have made a lot of verifications with the script and DCP-o-Matic.

if you want we have several examples.

Is that possible to modify and put charset in settings of dcp check ?

Thanks a lot,

Best regards,

remia commented 2 weeks ago

Thanks for the report @JeromeLesaint. Is that a Windows specific issue? We had case in the past where we had to replace utf-8 by utf-8-sig if I remember correctly.

JeromeLesaint commented 1 week ago

Thanks for the report @JeromeLesaint. Is that a Windows specific issue? We had case in the past where we had to replace utf-8 by utf-8-sig if I remember correctly.

Thanks, it's not on windows but on linux (Ubuntu). It's a french installation of linux. And the terminal is on utf-8. I replace decode('utf-8') with decode() and it seems to be work. Nevertheless, i have another problem with mountpoints, filesystems in linux who disturbing the verification. I can try utf8-sig two.

Thanks,

Best regards

remia commented 1 week ago

Interesting, I don't think we have seen such issues on Linux, is that happening all the time or on specifics DCP? Might be useful if you could share privately some of the examples indeed. I don't believe utf8-sig will help you there then.

But regardless, I think you are correct that the utf8 decode is not correct here.

remia commented 1 week ago

Which Python version do you use by the way?

JeromeLesaint commented 1 week ago

Python 3.10 and hash checking fail rather on large files