Open alexJhao opened 1 month ago
Hi @alexJhao. From the information you gave
python 3.11.5
GDAL 3.6.2
fiona 1.9.6
it looks like you have built Fiona from its source. Is that true? The WIndows distributions on pypi.org have GDAL version 3.8.4.
Can you check to see that your GDAL library was built with support for the iconv library that provides internationalization support?
Hi @alexJhao. From the information you gave
python 3.11.5 GDAL 3.6.2 fiona 1.9.6
it looks like you have built Fiona from its source. Is that true? The WIndows distributions on pypi.org have GDAL version 3.8.4.
Can you check to see that your GDAL library was built with support for the iconv library that provides internationalization support?
I not sure whether built Fiona from its source or not. I use "conda install geopandas". I also search iconv on GDAL document. It is said the iconv is completed from 1.6.0 release. https://gdal.org/development/rfc/rfc23_ogr_unicode.html#encoding-names
@alexJhao thank you. I don't use MapInfo and am not an expert on the format, so I hope I do not lead you off course. I wonder if you need to use the encoding
option when creating the MapInfo dataset? See https://gdal.org/drivers/vector/mitab.html#layer-creation-options. For example, like
>>> with fiona.open(r'e:\temp\7\a.tab', 'w', driver='MapInfo File', crs=crs, schema=schema, encoding='GBK') as dst:
... dst.write(feat)
Or maybe UTF-8 would be better. I'm not sure.
@alexJhao thank you. I don't use MapInfo and am not an expert on the format, so I hope I do not lead you off course. I wonder if you need to use the
encoding
option when creating the MapInfo dataset? See https://gdal.org/drivers/vector/mitab.html#layer-creation-options. For example, like>>> with fiona.open(r'e:\temp\7\a.tab', 'w', driver='MapInfo File', crs=crs, schema=schema, encoding='GBK') as dst: ... dst.write(feat)
Or maybe UTF-8 would be better. I'm not sure.
Had tried UTF-8 also, same result😓
I write a script to solved it temporarily.
def gdf2Tab(data: gpd.GeoDataFrame, filename: str, encoding="cp936"):
"""solved field name encoding error in Mapinfo Tab file
Args:
data (gpd.GeoDataFrame): gdf data
filename (str): saved file_name
encoding (str, optional): same as gpd.to_file. Defaults to "cp936".
"""
assert isinstance(data, gpd.GeoDataFrame)
tab_fp = Path(filename)
assert tab_fp.name.find("tab") > -1
columns = data.columns.tolist()
columns.remove("geometry")
data.to_file(filename=filename, driver="MapInfo File", encoding=encoding)
tmp_tab_fp = Path(tab_fp.parent / Path("tmp_" + tab_fp.name))
# read all line in Tab File
with open(filename, "rb") as source:
new_lines = source.readlines()
# mark the first line no with 'field'
line_no_s = -1
field_count = 0
for idx, line in enumerate(new_lines):
if line.find(b"Fields") > -1:
line_no_s = idx + 1
field_count = int(line.strip().split(b" ")[1])
break
# change field name with ansi coding
for field_idx in range(field_count):
line_no = line_no_s + field_idx
line_bytes = new_lines[line_no]
line_byte = line_bytes.split(b" ")
field_bytes_idx = 0
for tmp_j in range(len(line_byte)):
if line_byte[tmp_j] != b"":
field_bytes_idx = tmp_j
break
field_byte = line_byte[field_bytes_idx]
if len(field_byte) <= 0:
break
new_field_byte = columns[field_idx].encode("ansi")
line_byte[field_bytes_idx] = new_field_byte
new_lines[line_no] = b" ".join(line_byte)
with open(str(tmp_tab_fp), "wb") as target:
target.writelines(new_lines)
tmp_tab_fp.replace(filename)
return True
On the Fiona main branch I see a KeyError
when I try to reproduce with the following code:
def test_issue1399(tmp_path):
"""Test schema encoding issue reported in #1399."""
schema = {
"properties": {"地市": "str:80", "区县": "str:80", "商业街名称": "str:80"},
"geometry": "Polygon",
}
with fiona.open(
tmp_path / "a.tab",
"w",
driver="MapInfo File",
crs=CRS.from_epsg(4326),
schema=schema,
) as colxn:
pass
fiona/collection.py:682: in __exit__
self.close()
fiona/collection.py:659: in close
self.flush()
fiona/collection.py:649: in flush
self.session.sync(self)
fiona/ogrext.pyx:1707: in fiona.ogrext.WritingSession.sync
gdal_flush_cache(cogr_ds)
fiona/ogrext.pyx:86: in fiona.ogrext.gdal_flush_cache
with cpl_errs:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> raise exception_map[err_no](err_type, err_no, msg)
E KeyError: 502
fiona/_err.pyx:196: KeyError
Logs mention MapInfo charset. The 502 error code is specific to MapInfo and is not one of the usual GDAL error codes that Fiona expects.
------------------------------------------ Captured log call ------------------------------------------
DEBUG fiona._env:env.py:315 GDAL data files are available at built-in paths.
DEBUG fiona._env:env.py:315 PROJ data files are available at built-in paths.
DEBUG fiona.ogrext:collection.py:229 File doesn't exist. Creating a new one...
WARNING fiona._env:collection.py:229 Cannot find MapInfo charset corresponding to iconv GBK encoding
DEBUG fiona._env:env.py:315 GDAL data files are available at built-in paths.
DEBUG fiona._env:env.py:315 PROJ data files are available at built-in paths.
WARNING fiona._env:collection.py:229 Cannot find MapInfo charset corresponding to iconv GBK encoding
DEBUG fiona._env:env.py:315 GDAL data files are available at built-in paths.
DEBUG fiona._env:env.py:315 PROJ data files are available at built-in paths.
DEBUG fiona.ogrext:collection.py:229 Created layer a
DEBUG fiona.ogrext:collection.py:229 Writing started
DEBUG fiona._env:env.py:315 GDAL data files are available at built-in paths.
DEBUG fiona._env:env.py:315 PROJ data files are available at built-in paths.
INFO fiona._env:collection.py:649 Unknown error number 502.
INFO fiona._env:collection.py:649 Unknown error number 502.
From looking at https://github.com/rouault/gdal/blob/65e177b7e3277bc3f39d64ae44796a8c813f4129/ogr/ogrsf_frmts/mitab/mitab_utils.cpp#L485 and the code below it, I think it's possible that MapInfo doesn't support non-Latin characters for field names. Is that true @rouault ?
Is that true @rouault ?
Good question to which I don't know the answer. Maybe @drons who introduced support for encodings in the mapinfo driver knows. Perhaps the "laundering" of characters of code >= 192 done in TABCleanFieldName() in mitab_utils.cpp should be removed when using a charset other than the default neutral one?
Good question...
TABCleanFieldName
came to us to support older versions of MapInfo. I think at the moment we can refuse "laundering" of characters of code >= 192
for non-neutral charset files.
Moreover, modern Mapinfo supports UTF-8 encoding, but GDAL don't (see mitab_imapinfofile.cpp apszCharsets
list).
Expected behavior and actual behavior.
I read data from GPKG, then write data to a Mapinfo TAB file. I found the field name error encode.
I use binary mode to open the a.tab file. found the field name changed.
in the TAB file, three field names is:
but the original field name are: '地市', '区县', '商业街名称' their 'ansi' code are :
Operating system
Win10
Fiona and GDAL version and provenance
python 3.11.5 GDAL 3.6.2 fiona 1.9.6