geopandas / pyogrio

Vectorized vector I/O using OGR
https://pyogrio.readthedocs.io
MIT License
272 stars 22 forks source link

Is there possible to change dtype during loading GML? #244

Open bartoszkoper opened 1 year ago

bartoszkoper commented 1 year ago

Hi,

I'm trying to change dtype when reading GML file, using engine=pyogrio at this stage.

Currently, there is no support for that.. so it means when It reads the values which should be treated as TEXT but it's changed to the INTs. I've read the docs, but I didn't find this. The case study:

I've value like '02' but it's treated as Int32. I want to force the dtype as string.

Do you have any idea?

brendan-ward commented 1 year ago

Related #174

Do you have a small GML test file that demonstrates this that we can use for testing? We store it as an integer because that's how GDAL is reporting the data type of that field, but I'd like to double-check what GDAL reports for it.

There are 2 parts to how we handle dtypes, neither of which we've started to implement: 1) what dtype we use for the numpy arrays when we initially read the raw data using GDAL (this issue) 2) what dtype we use on the Pandas side, which is especially problematic for date types (#174)

bartoszkoper commented 1 year ago

Hey thanks,

Basically yes ->

            gf = geopandas.read_file(file, engine="pyogrio")
            gf = gf.to_crs(4326) # 4326 - EPSG: WGS84

Regarding file, here is a link for Polish Gov site and I hope that 'voivodeoship' file has been attached below. (Granice wojewodztw)

http://mapy.geoportal.gov.pl/wss/service/ATOM/httpauth/download/?fileId=27df81c9da349dde5b7328554770bed3&name=00_jednostki_administracyjne_gml.zip

A01_Granice_wojewodztw.zip

PS. Currently I workaround and solved the problem using the following config;

from pyogrio import set_gdal_config_options set_gdal_config_options({'GML_FIELDTYPES': 'ALWAYS_STRING'})

The problem was that the string should be typed to string but it was converted to int, so the further matches didn't happend.

Thanks, BK