Open aborruso opened 1 year ago
I don't think this is possible right now, but it's actually a pretty intriguing idea. DuckDB has some reusable code used by the json/csv/parquet reader functions to handle multiple file inputs and maybe we would make it work for ST_Read too with some work.
However you can get the available files matching a pattern from within duckdb using SELECT * FROM glob('...')
function.
I don't think this is possible right now, but it's actually a pretty intriguing idea. DuckDB has some reusable code used by the json/csv/parquet reader functions to handle multiple file inputs and maybe we would make it work for ST_Read too with some work.
Do I turn this into a feature request?
Thank you
It would be great to see glob support added to ST_READ
. At the moment, I need a SQL statement to collect the bathymetry from each depth level from Natural Earth. It would be nice to wrap that up into a single SQL statement. It would also bring ST_READ
's behaviour inline with the CSV and parquet reading functions in DuckDB as well.
$ ls -lht ne_10m_bathymetry_*.shx
-rwxrwxrwx 1 mark staff 124B Aug 13 2017 ne_10m_bathymetry_A_10000.shx
-rwxrwxrwx 1 mark staff 276B Aug 13 2017 ne_10m_bathymetry_B_9000.shx
-rwxrwxrwx 1 mark staff 220B Aug 13 2017 ne_10m_bathymetry_C_8000.shx
-rwxrwxrwx 1 mark staff 572B Aug 13 2017 ne_10m_bathymetry_D_7000.shx
-rwxrwxrwx 1 mark staff 4.2K Aug 13 2017 ne_10m_bathymetry_E_6000.shx
-rwxrwxrwx 1 mark staff 15K Aug 13 2017 ne_10m_bathymetry_F_5000.shx
-rwxrwxrwx 1 mark staff 14K Aug 13 2017 ne_10m_bathymetry_G_4000.shx
-rwxrwxrwx 1 mark staff 5.3K Aug 13 2017 ne_10m_bathymetry_H_3000.shx
-rwxrwxrwx 1 mark staff 1.7K Aug 13 2017 ne_10m_bathymetry_I_2000.shx
-rwxrwxrwx 1 mark staff 1.3K Aug 13 2017 ne_10m_bathymetry_J_1000.shx
-rwxrwxrwx 1 mark staff 4.2K Aug 13 2017 ne_10m_bathymetry_K_200.shx
-rwxrwxrwx 1 mark staff 276B Aug 13 2017 ne_10m_bathymetry_L_0.shx
$ duckdb
SELECT COUNT(*) FROM ST_READ('ne_10m_bathymetry_*.shx');
Error: IO Error: GDAL Error (4): ne_10m_bathymetry_*.shx: No such file or directory
+1, need this as well -- trying to combine shapefiles in this directory: https://www2.census.gov/geo/tiger/TIGER2020/PUMA20/ (to make this plot spatial: https://jaanli.github.io/american-community-survey/income)
SQL that does not work with the spatial extension's lack of support for glob syntax:
COPY(
SELECT * FROM '~/data/american_community_survey/TIGER2020/PUMA20/**/*.shp'
) TO '~/data/american_community_survey/shapefiles/TIGER2020/PUMA20/combined/TIGER2020_PUMA20.shp' WITH (FORMAT GDAL, DRIVER 'ESRI Shapefile');
Otherwise, this extension is amazing and lets me do things I could not before -- thank you!!
Here's similar code with Geopandas:
from pathlib import Path
dir = Path('/Users/me/data/american_community_survey/shapefiles/TIGER2020/PUMA20')
gdf_list = []
for file in dir.glob('**/*.shp'):
print(file)
gdf = gpd.read_file(file)
gdf_list.append(gdf)
gdf = pd.concat(gdf_list)
gdf.to_file(dir / 'combined' / 'TIGER2020_PUMA20.shp')
Hi, I have a lot of file pbf in a folder. I am able to read a single file using
SELECT * FROM ST_READ('11_1076_766.pbf');
.It seems not possibile to run something like
SELECT * FROM ST_READ('/folder/*.pbf');
, I have this error:Is there another way to do it? Is there another way to merge all?
Thank you