Open mbauer288 opened 1 year ago
Regarding issue A)
Certainly a discussed topic in geopandas: https://github.com/geopandas/geopandas/issues/1490
Geopanda's to_file()
supports writing to shapefiles and GeoJSON, neither of which allow for multiple geom columns. Therefore, to_file()
will (for now) not allow writing GeoDataFrames with more than one geometry column. This is unfortunate since GeoDataFrames, GeoPackges, SpatiaLite, and PostGIS all very much do support multiple geometry columns. For now, we need customize writing/reading functions. Here is how I did for PostGIS. I will add this to STAREPandas ASAP. As soon as we are confident enough, we should also suggest this to the folks at geopandas
import geopandas
import pandas
import shapely.wkb
import shapely.geos
import geoalchemy2
import sqlalchemy
import psycopg2.extensions
import numpy
def load_geom_text(x):
"""Load from binary encoded as text."""
return shapely.wkb.loads(str(x), hex=True)
def read(table_name, con):
gdf = pandas.read_sql(f'SELECT * FROM {table_name}', con=con)
query = f"SELECT * FROM geometry_columns WHERE f_table_name = '{table_name}'"
geom_columns = pandas.read_sql(query, con=con)
query = """SELECT column_name
FROM information_schema.columns
WHERE table_name='cells_roi'
AND data_type='ARRAY'"""
arrays = pandas.read_sql(query, con=con)['column_name'].tolist()
for array in arrays:
gdf[array] = gdf[array].apply(numpy.array)
for column in geom_columns.f_geometry_column:
geoms = gdf[column].apply(load_geom_text)
crs = shapely.geos.lgeos.GEOSGetSRID(geoms[0]._geom)
gdf[column] = geopandas.GeoSeries(geoms, crs=crs)
return gdf
def get_geom_type(gdf, column):
geom_types = list(gdf[column].geom_type.unique())
if len(geom_types) == 1:
target_geom_type = geom_types[0].upper()
else:
target_geom_type = "GEOMETRY"
return target_geom_type
def addapt_numpy_float64(numpy_float64):
return psycopg2.extensions.AsIs(numpy_float64)
def addapt_numpy_int64(numpy_int64):
return psycopg2.extensions.AsIs(numpy_int64)
typemap ={'int16': sqlalchemy.types.Integer,
'int64': sqlalchemy.types.BigInteger,
'object': sqlalchemy.types.ARRAY(sqlalchemy.types.BigInteger),
'float64': sqlalchemy.types.Float}
def write(gdf, engine):
gdf = gdf.copy()
gdf = geopandas.GeoDataFrame(gdf)
g_dtypes = {}
for column, dtype in gdf.dtypes.items():
if dtype == 'geometry':
dtype = get_geom_type(gdf, column)
dtype = geoalchemy2.Geometry(dtype)
elif dtype.name in typemap.keys():
dtype = typemap[dtype.name]
g_dtypes[column] = dtype
srid = 4326
for column, dtype in gdf.dtypes.items():
if dtype == 'geometry':
gdf[column] = [shapely.wkb.dumps(geom, srid=srid, hex=True) for geom in gdf[column]]
psycopg2.extensions.register_adapter(numpy.float64, addapt_numpy_float64)
psycopg2.extensions.register_adapter(numpy.int64, addapt_numpy_int64)
gdf.to_sql(name='cells', con=engine, if_exists='replace', dtype=g_dtypes, index=False)
Regarding issue B)
GeoPackages are based on SQLite and SQLite does support for arrays. Therefore they have to be serialized one way or the other. Converting to string is one way. Another one would be to convert to some binary form, e.g. via pickle or io
.
Probably nicer would be to register a datatype in the SQLite connection: https://stackoverflow.com/questions/18621513/python-insert-numpy-array-into-sqlite3-database
One way or another, we need to make reading and writing for both issue A) and B) transparent. E.g. overloading geopandas' to_file()
or with a starepandas to_starepackage()
(or to_starelite()
) method. The latter would is nice since STARELite already has an int64array datatype.
Make a notebook example.
I added the functionality. You can now do starepandas.to_postgis(sdf, engine, table_name). Eventually. sdf.to_postgis() would probably be nice
An FYI there are two issues to be aware of concerning saving STAREPandas DataFrames to the GeoPackage (GPKG).
Issue-A: The GeoPackage API does not allow for multiple geometry-type columns in a DataFrame.
This happens when your DataFrame already has a geometry column (POINT, POLYGON etc.) and then you another geometry column such as a trixel cover (POLYGON). This will not save as is to a GeoPackage file. Note that GeoPackage is find with additional non-geometry columns like SIDs.
Issue-B: The GeoPackage API does not allow for array-type columns in a DataFrame.
This is similar to Issue-A, but affects attempts to even save just SIDs when the underlying geometry is POLYGON (or similar), because the SIDs are returned as numpy arrays rather than a single int64. The solution here is to convert the SID array in each row to a string. This will allow you to save as a GeoPackage file. However, you will have to reverse this when reading these files (converting each SID row from a string back to a numpy array of int64). The following is a simple example of this.
I guess something similar would work for Issue-A (converting the POLYGON geometry columns to strings and back again).