Open codeananda opened 5 months ago
This is EWKT, or Extended Well-Known Text representation of geometries. With the exception of empty geometries, which are invalid here. I can't seem to find a reference to it in GDAL docs so I suppose that it supports only standard WKT, which is everything after ;
.
We could potentially try to detect it, parse it and create geometry column, assuming all SRID's are the same, though it is not supported at the moment. Though you'd need to ensure that POLYGON()
is not present as that is incorrect WKT.
My recommendation for now is to use what you have, maybe trying to use vectorized shapely.from_wkt
rather than wkt.loads
but that is a minor difference.
For information, as mentioned by @martinfleis , if you would have to process larger datasets it would probably be more efficiënt to use only vectorized functions.
E.g. like this:
import pandas as pd
import geopandas as gpd
import shapely
from io import StringIO
# Toy example
csv_content = """fid,geom
1,"SRID=27700;POLYGON()"
2,"SRID=27700;POLYGON((10 10, 20 20, 30 10, 10 10))"
3,"SRID=27700;MULTIPOLYGON()"
"""
shlaa_df = pd.read_csv(StringIO(csv_content))
shlaa_df[["srid", "wkt"]] = shlaa_df["geom"].str.split(";", expand=True)
shlaa_df.loc[shlaa_df["wkt"] == "POLYGON()", "wkt"] = "POLYGON EMPTY"
shlaa_df.loc[shlaa_df["wkt"] == "MULTIPOLYGON()", "wkt"] = "MULTIPOLYGON EMPTY"
crs = shlaa_df.iloc[0]["srid"].split("=")[1]
shlaa_gdf = gpd.GeoDataFrame(geometry=shapely.from_wkt(shlaa_df["wkt"]), crs=crs)
print(shlaa_gdf)
print(shlaa_gdf.crs)
Amazing thank you!
@theroggy could you write 'python' after the triple backticks? It's so much nicer to read.
I've got some csv files where the geometry column is named 'geom' and looks like this
To load it, I manually check and modify the columns but wonder if there is a way for pyogrio to handle this itself?
Some things I've tried
GEOM_POSSIBLE_NAMES=['geom']
inpyogrio.read_dataframe
- but think it got confused with SRID info.My working code