ENH: Add index_as_fid option to write_dataframe?

geopandas / pyogrio

Vectorized vector I/O using OGR

https://pyogrio.readthedocs.io

MIT License

260 stars 22 forks source link

ENH: Add index_as_fid option to write_dataframe? #287

Open theroggy opened 10 months ago

theroggy commented 10 months ago

There is an fid_as_index option in read_dataframe to be able to... read the fid. However, there is no explicit option to e.g. index_as_fid in write_dataframe to write it again.

For e.g. shapefiles writing fid's is not relevant at all as they don't save fid's.

For Geopackages there is a workaround: if you add the index as a column named "fid" to the dataframe you will write the fid.

Not sure if an explicit option is needed... an alternative could be to just document the "workaround" or use this issue as documentation :-).

jorisvandenbossche commented 9 months ago

Do you know if GDAL exposes some information or metadata about this? For example so that we can raise an error if the file format actually does not support it?

theroggy commented 9 months ago

Do you know if GDAL exposes some information or metadata about this? For example so that we can raise an error if the file format actually does not support it?

Yes, if OGR_L_GetFIDColumn() returns "", the file type cannot save fid's. If it returns another string, that's the column name used to store the fid.

Examples:

for shapefile "" is returned.
for Geopackage "fid" is returned: that's the default name of the column the fid is stored in.
for sqlite files I never tested/used it, but according to the gdal doc the default column name for the fid is "OGC_FID", so I suppose that OGR_L_GetFIDColumn() will return that and I suppose that you'll have to call the column like that in the GeoDataFrame if you want the fid's to be retained.

jorisvandenbossche commented 9 months ago

Nice, in that case this sounds as a good option to me.

theroggy commented 4 months ago

I just encountered another way to save the index of the GeoDataFrame as "fid" (primary key) in a GPKG: if the name of the index is "fid" it will also be saved as "fid" in the file.

PS: Fiona treats this the same way.

xref: https://github.com/geopandas/geopandas/issues/3217

jorisvandenbossche commented 2 months ago

(moved this to the next milestone)