Closed remi-braun closed 5 months ago
@remi-braun Happy new year! OGR doesn't have a short integer type, only 32 and 64-bit integers. Neither does Fiona at this time, thus your layers are being constructed with 32 bit wide integer fields. I don't think there is any logic in the GDB driver to reduce the width at creation time.
Do you see different behavior if you use ogr2ogr or pyogrio?
Happy new year to you too 😉
I think pyogrio doesn't handle schemas, so I haven't tried.
What's weird is that for ESRI a short isn't an int16
but also an int32
, but I don't exactly know what means the end of int32:4
.
Note that text:255
works for GDB, so the :
mechanism is in some way already handled in OpenFileGDB.
And we made it all work for Shapefiles, so other drivers handle this mechanism.
I think pyogrio doesn't handle schemas, so I haven't tried.
Pyogrio doesn't support the schema
keyword (as that is a fiona specific parameter), but it certainly does support writing the different data types. But because the input for pyogrio is a geopandas DataFrame or numpy arrays, the data already has a schema, and pyogrio uses that (instead of letting the user specify it separately).
So if you ensure that your input data has an int16
column, pyogrio should pass that information through to GDAL:
import geopandas
from shapely.geometry import Point
gdf = geopandas.GeoDataFrame({"col": np.array([1, 2, 3], dtype="int16"), "geometry": [Point(i, i) for i in range(3)]})
gdf
# col geometry
# 0 1 POINT (0 0)
# 1 2 POINT (1 1)
# 2 3 POINT (2 2)
gdf.to_file("test_gdb.gdb", driver="OpenFileGDB", engine="pyogrio")
I am not fully sure how then to check independently whether it has actually written the correct data type to the OpenFileGDB, but ogrinfo
indicates that it has:
$ ogrinfo test_gdb.gdb test_gdb
INFO: Open of `test_gdb.gdb'
using driver `OpenFileGDB' successful.
Layer name: test_gdb
Geometry: Point
Feature Count: 3
Extent: (0.000000, 0.000000) - (2.000000, 2.000000)
Layer SRS WKT:
(unknown)
FID Column = OBJECTID
Geometry Column = SHAPE
col: Integer(Int16) (0.0)
OGRFeature(test_gdb):1
col (Integer(Int16)) = 1
POINT (0 0)
OGRFeature(test_gdb):2
col (Integer(Int16)) = 2
POINT (1 1)
OGRFeature(test_gdb):3
col (Integer(Int16)) = 3
POINT (2 2)
And if you want to control the exact OpenFIleGDB types being used by GDAL, it seems to have a creation option COLUMN_TYPES
that can be passed (see https://gdal.org/drivers/vector/openfilegdb.html#layer-creation-options, but didn't try this)
@sgillies OGR indeed only uses int32 or int64 data in its internal data model, but there is the concept of "sub type" to annotate a type with additional information (I assume it doesn't change how the data is represented internally, still int32, but then it is used as a hint when writing): https://gdal.org/api/vector_c_api.html#_CPPv415OGRFieldSubType, and there is has a OFSTInt16
.
Pyogrio uses this when the input data has a bitwidth < 32, and based on the example above, it seems to have effect. Fiona could use this as well. It's already declared:
and could set it like is already done for bool subtype as well:
With pyogrio
, I successfully wrote short
dtypes!
However, geopandas doesn't read corretly the input type, so I had to change the type of every column (which could be time consuming):
import geopandas as gpd
gdb_path = "my_gdb.gdb"
layer = "B1_observed_event_a"
# Read layer
observed_event = gpd.read_file(gdb_path , layer=layer)
# Set correct types
observed_event.event_type = observed_event.event_type.astype("int32")
observed_event.obj_desc = observed_event.obj_desc.astype("int16")
observed_event.notation = observed_event.notation.astype("str") # How can I set str:255 ?
observed_event.det_method = observed_event.det_method.astype("int16")
observed_event.dmg_src_id = observed_event.dmg_src_id.astype("int32")
# Write back in gdb
observed_event.to_file("my_gdb_copy.gdb", layer=layer, driver="OpenFileGDB", engine="pyogrio")
However, with fiona's schema, I succeeded to set str:255
as field, but not with pyogrio
. How can I do that ?
PS: the goal of all this is to allow the GDB domains to be recognized automatically, but I don't know if it will work even with the correct types
@remi-braun I've begun working on this and have 2 questions.
4
in int32:4
specific to Shapefiles? I'd love to not have to think about this anymore if we don't have to. It's not clear to me that OGR will coerce a 4 char wide OFTInt to OFTInt16 when saving.@sgillies thanks for taking this seriously!
arcpy
...
My knowledge on this only comes from the answer I shared in the initial issue 😞
Hello,
I am currently struggling to write a column with a
short
type in a GDB. After some research, I found a way to write shapefiles with a difference between long a short integers with this solution (which works).Expected behavior and actual behavior.
Both
'int32:4'
and'int32:10'
types in schema give along
column in a GDB, instead of giving ashort
and along
column.Steps to reproduce the problem.
Code
PS: I know I am using geopandas, but they implement seamlessly the Fiona schemas, this is why I am iting the issue here.
Output
Operating system
Windows
Fiona and GDAL version and provenance
Conda (conda-forge)