frictionlessdata / frictionless-py

Data management framework for Python that provides functionality to describe, extract, validate, and transform tabular data
https://framework.frictionlessdata.io
MIT License
700 stars 148 forks source link

Feature Request: read ESRI shape files #1583

Open jze opened 1 year ago

jze commented 1 year ago

Overview

A common format in which geo-related public sector data is provided is ESRI shape (shp). Those are ZIP files that contain the geometry and an additional attribute table. Usually only little information is know about the content of the attribute table. This is where Frcitionless could help. If you want to create statistical evaluations, you often do not need a full GIS program and you could treat the shp files like tables.

Using GeoPanadas it is simple to access the data contained in a shape file. Here is an example that uses a file with data on traffic lights:

import geopandas as gpd
shapefile_path = "Lichtsignalanlagen.zip"
gdf = gpd.read_file( f"zip://{shapefile_path}")
print(gdf)

This is the output:

      FID  NR                                         BEZEICHNUN                        geometry
0   34016  50              Feldstraße (Kath. Kirche) Beselerstr.  POINT (543146.869 5956685.986)
1   33986  33  Westerstraße (B431), Reichenstraße / Vormstege...  POINT (543236.506 5955852.612)
2   34030  64                               Steindamm / Gooskamp  POINT (543789.474 5955973.802)
3   33948  12                     Köllner Chaussee / Krückaupark  POINT (544497.901 5956416.905)
4   34008  46       Friedensallee / Friedenstraße / Amandastraße  POINT (543764.121 5956905.022)
..    ...  ..                                                ...                             ...
62  33980  30                 Hamburger Straße / Hainholzer Damm  POINT (544255.692 5955584.940)
63  33926   1  Berliner Straße / Schauenburgerstraße / Probst...  POINT (543382.916 5956266.351)
64  34004  43      Hainholzer Damm / Wasserstraße / Fröbelstraße  POINT (544190.755 5955020.948)
65  73942  78  Hamburger Straße / Feuerwache Süd (Ausfahrt Fe...  POINT (543747.192 5955752.459)
66  78293  91                      Gärtnerstraße Höhe Hs.-Nr. 31  POINT (542949.420 5956757.481)

[67 rows x 4 columns]

Frictionless already understands the WKT in the geometry column.

jze commented 1 year ago

I did a little research. Here is code that collects information Here is code that lists the included layers and the attribute table data types.

import geopandas
import fiona

shapefile_path = 'Lichtsignalanlagen.zip'
layer_names = fiona.listlayers(f"zip://{shapefile_path}")

for layer_name in layer_names:
    print(layer_name)

    gdf = geopandas.read_file(shapefile_path, layer=layer_name)

    column_info = []
    for column_name, data_type in gdf.dtypes.iteritems():
        column_info.append((column_name, str(data_type)))

    for name, data_type in column_info:
        print(f"  name: {name}, type: {data_type}")

Here is an example of a shape file that contains more layers and data types: nag_fach_100.zip