geopandas / pyogrio

Vectorized vector I/O using OGR
https://pyogrio.readthedocs.io
MIT License
272 stars 22 forks source link

Release plans for GDAL 3.8.1 #330

Closed jbylina closed 4 months ago

jbylina commented 9 months ago

Hey! Thanks for all the great work! pyogrio is super useful as companion to geopandas and GPKG The latest GDAL release - 3.8.1 includes spatialite support which was requested as a result of PR: https://github.com/geopandas/pyogrio/pull/70#issuecomment-1109033319 As a pyogrio user I can't wait to start to use read_dataframe(file, sql="select ... where st_intersects(..., ...);") which offloads computations to the underlying spatial index. Currently, to achieve this, compilation from sources is necessary which can be tricky on older systems. Precompiled pyorgio that includes spatialite support would be neet. Any release plans?

Tests look promising: https://github.com/geopandas/pyogrio/pull/327 🙌

theroggy commented 9 months ago

Hello!

GDAL has been supporting the use of spatialite functions for many many years. The PR you reference to just added spatialite to the docker container "ubuntu-small" (in may 2022), so people using that specific container can use spatialite (without having to recompile GDAL).

So, normally you should be able to use the above with about any version of GDAL and at least pyogrio 0.4.0.

Maybe you just wrote the SQL query like that for brevity, but to avoid misunderstandings, "select ... where st_intersects(..., ...);" won't use the spatial index of the geopackage. In contrast to e.g. PostGIS, you need to explicitly add spatial index filtering in your query to make use of it. If you need any more info how to to that... feel free to ask.

If you are specifically looking for a way to only read the records that intersect with a geometry (using the spatial index), pyogrio 0.7 even has a specific new feature for this:

mask = shapely.Polygon(([-80,8], [-80, 10], [-85,10], [-85,8], [-80,8]))
read_dataframe(file, mask=mask)
jbylina commented 8 months ago

Hello! Thanks for the explanation and example :) That's true; I didn't know that detail. For future readers, here is an example of how to use spatial index manually:

bash-5.2$ /opt/homebrew/opt/sqlite/bin/sqlite3 geoms.gpkg
SQLite version 3.44.2 2023-11-24 11:41:44
Enter ".help" for usage hints.
sqlite> PRAGMA application_id;
1196444487
sqlite> PRAGMA compile_options; -- ENABLE_RTREE flag required!
ATOMIC_INTRINSICS=1
COMPILER=clang-15.0.0
DEFAULT_AUTOVACUUM
DEFAULT_CACHE_SIZE=-2000
DEFAULT_FILE_FORMAT=4
DEFAULT_JOURNAL_SIZE_LIMIT=-1
DEFAULT_MMAP_SIZE=0
DEFAULT_PAGE_SIZE=4096
DEFAULT_PCACHE_INITSZ=20
DEFAULT_RECURSIVE_TRIGGERS
DEFAULT_SECTOR_SIZE=4096
DEFAULT_SYNCHRONOUS=2
DEFAULT_WAL_AUTOCHECKPOINT=1000
DEFAULT_WAL_SYNCHRONOUS=2
DEFAULT_WORKER_THREADS=0
DQS=0
ENABLE_API_ARMOR
ENABLE_COLUMN_METADATA
ENABLE_DBPAGE_VTAB
ENABLE_DBSTAT_VTAB
ENABLE_EXPLAIN_COMMENTS
ENABLE_FTS3
ENABLE_FTS3_PARENTHESIS
ENABLE_FTS4
ENABLE_FTS5
ENABLE_GEOPOLY
ENABLE_MATH_FUNCTIONS
ENABLE_MEMORY_MANAGEMENT
ENABLE_PREUPDATE_HOOK
ENABLE_RTREE
ENABLE_SESSION
ENABLE_STAT4
ENABLE_STMTVTAB
ENABLE_UNLOCK_NOTIFY
MALLOC_SOFT_LIMIT=1024
MAX_ATTACHED=10
MAX_COLUMN=2000
MAX_COMPOUND_SELECT=500
MAX_DEFAULT_PAGE_SIZE=8192
MAX_EXPR_DEPTH=1000
MAX_FUNCTION_ARG=127
MAX_LENGTH=1000000000
MAX_LIKE_PATTERN_LENGTH=50000
MAX_MMAP_SIZE=0x7fff0000
MAX_PAGE_COUNT=1073741823
MAX_PAGE_SIZE=65536
MAX_SQL_LENGTH=1000000000
MAX_TRIGGER_DEPTH=1000
MAX_VARIABLE_NUMBER=250000
MAX_VDBE_OP=250000000
MAX_WORKER_THREADS=8
MUTEX_PTHREADS
SYSTEM_MALLOC
TEMP_STORE=1
THREADSAFE=1
USE_URI
sqlite> .load /opt/homebrew/opt/libspatialite/lib/mod_spatialite.dylib
sqlite> select enablegpkgmode();
sqlite> .tables
geoms                    gpkg_ogr_contents        rtree_geoms_geom
gpkg_contents            gpkg_spatial_ref_sys     rtree_geoms_geom_node
gpkg_extensions          gpkg_tile_matrix         rtree_geoms_geom_parent
gpkg_geometry_columns    gpkg_tile_matrix_set     rtree_geoms_geom_rowid
sqlite> .timer on
sqlite> SELECT count(*) FROM geoms where fid > 0; -- always true condition to trigger simple per row computation
11895
Run Time: real 0.009 user 0.001894 sys 0.006067
sqlite> -- without spatial index
WITH input as (
 SELECT geomfromgeojson('{"coordinates":[[[-4.83,5.79],[-4.83,5.07],[-3.96,5.07],[-3.96,5.79],[-4.83,5.79]]],"type":"Polygon"}') as geom
)
SELECT count(1)
FROM geoms
WHERE st_intersects(geom, (SELECT geom FROM input)) = 1;
20
Run Time: real 0.155 user 0.095722 sys 0.058361
sqlite> -- spatial index use
WITH input as (
 SELECT geomfromgeojson('{"coordinates":[[[-4.83,5.79],[-4.83,5.07],[-3.96,5.07],[-3.96,5.79],[-4.83,5.79]]],"type":"Polygon"}') as geom
), nearby_geoms as (
    SELECT *
    FROM geoms t JOIN rtree_geoms_geom r ON t.fid = r.id
    WHERE r.minx <= (SELECT ST_MaxX(geom) FROM input) AND r.maxx >= (SELECT ST_MinX(geom) FROM input) AND
          r.miny <= (SELECT ST_MaxY(geom) FROM input) AND r.maxy >= (SELECT ST_MinY(geom) FROM input)
)
SELECT count(1)
FROM nearby_geoms
WHERE st_intersects(geom, (SELECT geom FROM input)) = 1;
20
Run Time: real 0.002 user 0.000652 sys 0.000649
sqlite> 

But the mentioned example works best!

mask = shapely.Polygon(([-80,8], [-80, 10], [-85,10], [-85,8], [-80,8]))
read_dataframe(file, mask=mask)

My original post comes from weird behavior found when redoing:

root@d3bcdad40336:/# python
Python 3.11.7 (main, Dec  5 2023, 18:55:16) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyogrio
>>> "st_intersects" in pyogrio.read_dataframe("/tmp/geoms.gpkg", sql="SELECT name from pragma_function_list", sql_dialect="sqlite")["name"].tolist()
False
>>> "st_intersects" in pyogrio.read_dataframe("/tmp/db.sqlite", sql="SELECT name from pragma_function_list", sql_dialect="sqlite")["name"].tolist()
True
>>> 
>>> "intersects" in pyogrio.read_dataframe("/tmp/geoms.gpkg", sql="SELECT name from pragma_function_list", sql_dialect="sqlite")["name"].tolist()
True
>>> "intersects" in pyogrio.read_dataframe("/tmp/db.sqlite", sql="SELECT name from pragma_function_list", sql_dialect="sqlite")["name"].tolist()
True
>>>

Looks like when querying the GPKG file, some functions/aliases are missing. Not sure if this is a bug or expected behaviour. The complete list of missing functions/aliases in GPKG context is:

{'sridfromauthcrs', 'st_envelopesintersects', 'st_maxy', 'st_miny', 'gpkg_isassignable', 'setsrid', 'hasspatialindex', 'ogr_gpkg_geometrytypeaggregate_internal', 'gdal_get_layer_pixel_value', 'transform', 'importfromepsg', 'createspatialindex', 'st_minx', 'st_transform', 'st_geometrytype', 'registergeometryextension', 'st_maxx', 'disablespatialindex', 'st_envintersects'}
Full log ``` jacek@ubuntu-playground-vm:~$ docker run -it python:3.11 bash root@d3bcdad40336:/# pip install pyogrio geopandas Collecting pyogrio Obtaining dependency information for pyogrio from https://files.pythonhosted.org/packages/8e/47/b0c8f44e1e1faf06216648748400ac634ef249a248a43a4a2ba5ddf7f54f/pyogrio-0.7.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata Downloading pyogrio-0.7.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.8 kB) Collecting geopandas Obtaining dependency information for geopandas from https://files.pythonhosted.org/packages/e3/23/00912e3236306ec52a06f95a08918cbb54f900419951726a20a8783a8507/geopandas-0.14.2-py3-none-any.whl.metadata Downloading geopandas-0.14.2-py3-none-any.whl.metadata (1.5 kB) Collecting certifi (from pyogrio) Obtaining dependency information for certifi from https://files.pythonhosted.org/packages/64/62/428ef076be88fa93716b576e4a01f919d25968913e817077a386fcbe4f42/certifi-2023.11.17-py3-none-any.whl.metadata Downloading certifi-2023.11.17-py3-none-any.whl.metadata (2.2 kB) Collecting numpy (from pyogrio) Obtaining dependency information for numpy from https://files.pythonhosted.org/packages/5a/62/007b63f916aca1d27f5fede933fda3315d931ff9b2c28b9c2cf388cd8edb/numpy-1.26.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata Downloading numpy-1.26.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.2/61.2 kB 1.3 MB/s eta 0:00:00 Collecting packaging (from pyogrio) Obtaining dependency information for packaging from https://files.pythonhosted.org/packages/ec/1a/610693ac4ee14fcdf2d9bf3c493370e4f2ef7ae2e19217d7a237ff42367d/packaging-23.2-py3-none-any.whl.metadata Downloading packaging-23.2-py3-none-any.whl.metadata (3.2 kB) Collecting fiona>=1.8.21 (from geopandas) Obtaining dependency information for fiona>=1.8.21 from https://files.pythonhosted.org/packages/07/ea/6674320c62a688bc1dc14201dfb7d4aeaea0939a1e733b85bae39e177325/fiona-1.9.5-cp311-cp311-manylinux2014_x86_64.whl.metadata Downloading fiona-1.9.5-cp311-cp311-manylinux2014_x86_64.whl.metadata (49 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.7/49.7 kB 4.4 MB/s eta 0:00:00 Collecting pandas>=1.4.0 (from geopandas) Obtaining dependency information for pandas>=1.4.0 from https://files.pythonhosted.org/packages/f8/8c/9ad173c5cd2c7178c84075c02ec37b5d1d53fb1d015f51ea3e623ea9c31c/pandas-2.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata Downloading pandas-2.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (18 kB) Collecting pyproj>=3.3.0 (from geopandas) Obtaining dependency information for pyproj>=3.3.0 from https://files.pythonhosted.org/packages/64/90/dfe5c00de1ca4dbb82606e79790659d4ed7f0ed8d372bccb3baca2a5abe0/pyproj-3.6.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata Downloading pyproj-3.6.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (31 kB) Collecting shapely>=1.8.0 (from geopandas) Obtaining dependency information for shapely>=1.8.0 from https://files.pythonhosted.org/packages/8c/47/05c8bb8322861113e72b903aebaaa4678ae6e44c886c189ad8fe297f2008/shapely-2.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata Downloading shapely-2.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.0 kB) Collecting attrs>=19.2.0 (from fiona>=1.8.21->geopandas) Obtaining dependency information for attrs>=19.2.0 from https://files.pythonhosted.org/packages/e0/44/827b2a91a5816512fcaf3cc4ebc465ccd5d598c45cefa6703fcf4a79018f/attrs-23.2.0-py3-none-any.whl.metadata Downloading attrs-23.2.0-py3-none-any.whl.metadata (9.5 kB) Collecting click~=8.0 (from fiona>=1.8.21->geopandas) Obtaining dependency information for click~=8.0 from https://files.pythonhosted.org/packages/00/2e/d53fa4befbf2cfa713304affc7ca780ce4fc1fd8710527771b58311a3229/click-8.1.7-py3-none-any.whl.metadata Downloading click-8.1.7-py3-none-any.whl.metadata (3.0 kB) Collecting click-plugins>=1.0 (from fiona>=1.8.21->geopandas) Downloading click_plugins-1.1.1-py2.py3-none-any.whl (7.5 kB) Collecting cligj>=0.5 (from fiona>=1.8.21->geopandas) Downloading cligj-0.7.2-py3-none-any.whl (7.1 kB) Collecting six (from fiona>=1.8.21->geopandas) Downloading six-1.16.0-py2.py3-none-any.whl (11 kB) Requirement already satisfied: setuptools in /usr/local/lib/python3.11/site-packages (from fiona>=1.8.21->geopandas) (65.5.1) Collecting python-dateutil>=2.8.2 (from pandas>=1.4.0->geopandas) Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 247.7/247.7 kB 5.1 MB/s eta 0:00:00 Collecting pytz>=2020.1 (from pandas>=1.4.0->geopandas) Obtaining dependency information for pytz>=2020.1 from https://files.pythonhosted.org/packages/32/4d/aaf7eff5deb402fd9a24a1449a8119f00d74ae9c2efa79f8ef9994261fc2/pytz-2023.3.post1-py2.py3-none-any.whl.metadata Downloading pytz-2023.3.post1-py2.py3-none-any.whl.metadata (22 kB) Collecting tzdata>=2022.1 (from pandas>=1.4.0->geopandas) Obtaining dependency information for tzdata>=2022.1 from https://files.pythonhosted.org/packages/a3/fb/52b62131e21b24ee297e4e95ed41eba29647dad0e0051a92bb66b43c70ff/tzdata-2023.4-py2.py3-none-any.whl.metadata Downloading tzdata-2023.4-py2.py3-none-any.whl.metadata (1.4 kB) Downloading pyogrio-0.7.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (22.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 22.1/22.1 MB 7.6 MB/s eta 0:00:00 Downloading geopandas-0.14.2-py3-none-any.whl (1.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 13.3 MB/s eta 0:00:00 Downloading fiona-1.9.5-cp311-cp311-manylinux2014_x86_64.whl (15.7 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.7/15.7 MB 11.8 MB/s eta 0:00:00 Downloading pandas-2.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.2/12.2 MB 13.6 MB/s eta 0:00:00 Downloading numpy-1.26.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.3/18.3 MB 12.9 MB/s eta 0:00:00 Downloading pyproj-3.6.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.6 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.6/8.6 MB 16.1 MB/s eta 0:00:00 Downloading shapely-2.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.5/2.5 MB 14.8 MB/s eta 0:00:00 Downloading certifi-2023.11.17-py3-none-any.whl (162 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 162.5/162.5 kB 13.8 MB/s eta 0:00:00 Downloading packaging-23.2-py3-none-any.whl (53 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.0/53.0 kB 5.0 MB/s eta 0:00:00 Downloading attrs-23.2.0-py3-none-any.whl (60 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 60.8/60.8 kB 6.3 MB/s eta 0:00:00 Downloading click-8.1.7-py3-none-any.whl (97 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.9/97.9 kB 8.9 MB/s eta 0:00:00 Downloading pytz-2023.3.post1-py2.py3-none-any.whl (502 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 502.5/502.5 kB 17.9 MB/s eta 0:00:00 Downloading tzdata-2023.4-py2.py3-none-any.whl (346 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 346.6/346.6 kB 16.0 MB/s eta 0:00:00 Installing collected packages: pytz, tzdata, six, packaging, numpy, click, certifi, attrs, shapely, python-dateutil, pyproj, pyogrio, cligj, click-plugins, pandas, fiona, geopandas Successfully installed attrs-23.2.0 certifi-2023.11.17 click-8.1.7 click-plugins-1.1.1 cligj-0.7.2 fiona-1.9.5 geopandas-0.14.2 numpy-1.26.3 packaging-23.2 pandas-2.1.4 pyogrio-0.7.2 pyproj-3.6.1 python-dateutil-2.8.2 pytz-2023.3.post1 shapely-2.0.2 six-1.16.0 tzdata-2023.4 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv [notice] A new release of pip is available: 23.2.1 -> 23.3.2 [notice] To update, run: pip install --upgrade pip root@d3bcdad40336:/# python Python 3.11.7 (main, Dec 5 2023, 18:55:16) [GCC 12.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from shapely import Polygon >>> from geopandas import GeoDataFrame >>> GeoDataFrame({"geometry": [Polygon([[0, 0], [2, 0], [2, 2], [0, 2]]),]}).to_file("/tmp/geoms.gpkg", layer="geoms", driver="GPKG") >>> import sqlite3 >>> sqlite3.connect("/tmp/db.sqlite").execute("CREATE TABLE IF NOT EXISTS dummy(id int)").close() >>> root@d3bcdad40336:/# python Python 3.11.7 (main, Dec 5 2023, 18:55:16) [GCC 12.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import pyogrio >>> "st_intersects" in pyogrio.read_dataframe("/tmp/geoms.gpkg", sql="SELECT name from pragma_function_list", sql_dialect="sqlite")["name"].tolist() False >>> "st_intersects" in pyogrio.read_dataframe("/tmp/db.sqlite", sql="SELECT name from pragma_function_list", sql_dialect="sqlite")["name"].tolist() True >>> "intersects" in pyogrio.read_dataframe("/tmp/geoms.gpkg", sql="SELECT name from pragma_function_list", sql_dialect="sqlite")["name"].tolist() True >>> "intersects" in pyogrio.read_dataframe("/tmp/db.sqlite", sql="SELECT name from pragma_function_list", sql_dialect="sqlite")["name"].tolist() True >>> set(pyogrio.read_dataframe("/tmp/geoms.gpkg", sql="SELECT name from pragma_function_list", sql_dialect="sqlite")["name"].tolist()) - set(pyogrio.read_dataframe("/tmp/db.sqlite", sql="SELECT name from pragma_function_list", sql_dialect="sqlite")["name"].tolist()) {'sridfromauthcrs', 'st_envelopesintersects', 'st_maxy', 'st_miny', 'gpkg_isassignable', 'setsrid', 'hasspatialindex', 'ogr_gpkg_geometrytypeaggregate_internal', 'gdal_get_layer_pixel_value', 'transform', 'importfromepsg', 'createspatialindex', 'st_minx', 'st_transform', 'st_geometrytype', 'registergeometryextension', 'st_maxx', 'disablespatialindex', 'st_envintersects'} >>> set(pyogrio.read_dataframe("/tmp/db.sqlite", sql="SELECT name from pragma_function_list", sql_dialect="sqlite")["name"].tolist()) - set(pyogrio.read_dataframe("/tmp/geoms.gpkg", sql="SELECT name from pragma_function_list", sql_dialect="sqlite")["name"].tolist()) set() >>> ```
theroggy commented 8 months ago

Looks like when querying the GPKG file, some functions/aliases are missing. Not sure if this is a bug or expected behaviour.

Interesting... and a bit weird.

I looked a bit further into it... especially because I have a different experience for HasSpatialIndex: in my experience it is the other way around: that it only works for GPKG and doesn't for SQLite files.

When I run the test you did above I get some different results on my system/environment (conda-forge based on Windows) than you did:

Missing for gpkg: {'importfromepsg', 'hasspatialindex', 'registergeometryextension', 'gdal_get_layer_pixel_value'}
Missing for sqlite: {'ogr_layer_featurecount', 'ogr_geocode', 'ogr_geocode_reverse', 'ogr_layer_geometrytype', 'ogr_layer_srid', 'ogr_deflate', 'ogr_version', 'regexp', 'ogr_inflate', 'ogr_datasource_load_layers', 'transform3'}

However, for HasSpatialIndex it is reported the same as for you: missing for GPKG. But, as I experienced before, when I run SELECT HasSpatialIndex(...), it does work for GPKG but doesn't for SQLite?

So... weird things all over...

My test script:

import geopandas as gpd
import pyogrio
import shapely

gdf = gpd.GeoDataFrame(geometry=[shapely.box(0, 0, 10, 10)], crs="epsg:4326")
sql = "SELECT name from pragma_function_list"

gpkg_path = "c:/temp/geoms.gpkg"
gdf.to_file(gpkg_path)
functions_gpkg = set(pyogrio.read_dataframe(gpkg_path, sql=sql)["name"].tolist())
hasspatialindex_gpkg = pyogrio.read_dataframe(
    gpkg_path, sql="SELECT hasspatialindex('geoms', 'geom')"
)["HasSpatialIndex"][0]
print(f"hasspatialindex on gpkg: {hasspatialindex_gpkg}")

sqlite_path = "c:/temp/db.sqlite"
gdf.to_file(sqlite_path, driver="SQLite")
functions_sqlite = set(pyogrio.read_dataframe(sqlite_path, sql=sql)["name"].tolist())
try:
    hasspatialindex_sqlite = pyogrio.read_dataframe(
        sqlite_path, sql="SELECT hasspatialindex('db', 'GEOMETRY')"
    )
    print(f"hasspatialindex on sqlite: {hasspatialindex_sqlite}")
except Exception as ex:
    print(f"hasspatialindex on sqlite failed with {ex}")

print(f"Missing for gpkg: {functions_gpkg - functions_sqlite}")
print(f"Missing for sqlite: {functions_sqlite - functions_gpkg}")
jorisvandenbossche commented 6 months ago

The latest GDAL release - 3.8.1 includes spatialite support

FWIW, this is only about the GDAL small docker images that will now contain a GDAL built with libspatialite, but we are not using those docker images to build our wheels, so that shouldn't impact the wheel build, which at this point still doesn't include libspatialite support, i.e. the libspatialite feature is not enabled in the vcpkg GDAL build.

Looking at it again, it seems we could do that . @brendan-ward you commented about that in https://github.com/geopandas/pyogrio/issues/86#issuecomment-1109012589:

Not sure if this matters, but noting for the record that our vcpkg build does not include libspatialite (we don't enable that as an add-on feature); it is LGPL licensed, which likely has license implications for the wheels we create. I believe Fiona also does not build against libspatialite.

In https://github.com/geopandas/pyogrio/pull/161 I first added libspatialite feature, but then removed it again before the PR was merged, and I don't directly see any discussion or explanation that explains why I did that .. But there is a bit of discussion later at https://github.com/geopandas/pyogrio/pull/161#issuecomment-1269044332, which again mentions that fiona also doesn't have this.

Now, I don't think licensing is a problem, given that GEOS also uses LGPL, and we do already include that in the wheels.

Given that vcpkg has the feature to build GDAL with libspatialite, we should probably just try that.

brendan-ward commented 6 months ago

Worth trying to include it since it is available. I don't recall why I flagged the license , but you're right that GEOS is the same way.

jorisvandenbossche commented 4 months ago

Going to close this. Pyogrio 0.8 was released with wheels containing GDAL 3.8.5, and we have https://github.com/geopandas/pyogrio/issues/378 about the issue of including spatialite in the wheels.