darcy-r / geoparquet-python

API between Parquet files and GeoDataFrames for fast input/output of GIS data. // This project was a proof of concept. For current development please see https://github.com/geopandas/geo-arrow-spec
MIT License
24 stars 1 forks source link

GeoParquet

GeoParquet for Python is a GeoPandas API designed to facilitate fast input/output of GIS data in the open source Parquet file format.

The project is currently a proof of concept.

Why is this project needed?

The GIS community currently lacks a fast, efficient, open-source file format for persisting and sharing data with. The purpose of the GeoParquet project is to develop such a file format.

How does it work?

The GeoParquet file format is simple adaptation of the existing Parquet file format: geometries are stored as well-known binary (WKB), and the coordinate reference system is stored as a WKT string (specifically WKT2_2018) in the file's metadata.

Basic usage

import geopandas as gpd
import geoparquet as gpq

# read in file from shapefile or other format using geopandas
gdf = gpd.read_file('example file.shp')

# call .to_geoparquet() method on geopandas GeoDataFrame to write to file
gdf.to_geoparquet('example file.geoparquet')

# read from file by calling gpq.read_geoparquet() function
gdf = gpq.read_geoparquet('example file.geoparquet')