LeoHsiao1 / pyexiv2

Read and write image metadata, including EXIF, IPTC, XMP, ICC Profile.
GNU General Public License v3.0
196 stars 39 forks source link

貌似现在在Windows,utf8解码任然无法正常使用。 #95

Closed Justinl666 closed 2 years ago

Justinl666 commented 2 years ago

目前遇见韩文的情况,无法使用utf8解码。

LeoHsiao1 commented 2 years ago

谢谢反馈。 我测试了下,在 Windows10 系统上,用 utf-8、gbk 都不能编码韩文:

>>> import pyexiv2
>>> path=r'c:\Users\Leo\Desktop\조선말.jpg'     
>>> pyexiv2.Image(path, 'utf-8')
RuntimeError: c:\Users\Leo\Desktop\조선말.jpg: Failed to open the data source: Illegal byte sequence (errno = 42)
>>> pyexiv2.Image(path, 'gbk')
UnicodeEncodeError: 'gbk' codec can't encode character '\uc870' in position 21: illegal multibyte sequence

用 gbk 可以编码日文:

>>> import pyexiv2
>>> path=r'c:\Users\Leo\Desktop\こんにちは.jpg'
>>> pyexiv2.Image(path, 'utf-8')                
RuntimeError: c:\Users\Leo\Desktop\こんにちは.jpg: Failed to open the data source: Illegal byte sequence (errno = 42)
>>> pyexiv2.Image(path, 'gbk')   
<pyexiv2.core.Image object at 0x000002280FD70430>

在 Linux 系统上,用 utf-8 可以编码韩文、日文:

>>> import pyexiv2
>>> path='조선말.jpg'
>>> pyexiv2.Image(path, 'utf-8')
<pyexiv2.core.Image object at 0x7f2aebfda070>
>>> path='こんにちは.jpg'
>>> pyexiv2.Image(path, 'utf-8')
<pyexiv2.core.Image object at 0x7f2aeb14fd90>

所以我的结论是:韩文、日文理应被 utf-8 正常编码,但 Windows 系统对这些字符采用了特别的编码格式。比如中文是用 GBK 编码,而不是 UTF-8。

几种解决方案:

Justinl666 commented 2 years ago

谢谢,目前使用第二种方案能够解决。