Open njmattes opened 8 years ago
@njmattes there is a shp2pgsql
(such as the raster2pgsql
) described in a tutorial. I saw that is also possible to get a GeoJSON
compliant polygon and ingest. The shapefiles to ingest would be the GADM
regions?
Yes I think we can start with just the GADM0 regions.Unless it's easier to ingest all of them (GADM1, GADM2) at once. Joshua's mentioned FPU regions, but I don't have outlines for those.
@njmattes we could load them in batches. Specially if something goes wrong, and we need to rollback. Do we have all those shapefiles already downloaded, or we still need to do so at the GADM website?
I don't have the GADM files anymore (they're huge). You can download the entire world here: http://www.gadm.org/version2.
@njmattes I've created a folder at /var/www/gadm
and downloaded the entire world there (by the way, super fast - 334MB in 14s). It is still zipped, and I would like to hear from @legendOfZelda if he tried to ingest shapefiles on his tests, and if we need to change something prior to ingest.
i ingested the .shp
file into a table called regions
using ogr2ogr
. the table with its columns was created by ogr2ogr
itself but we don't really want all these columns (varname_1
, nl_name_1
, engtype_1
, etc). all we need is a primary key, geom, and a JSON with the metadata (that can contain the geometry again, not super efficient in space but convenient).
besides, a new .shp
file might have other attributes than those, forcing us to let ogr2ogr
create a new table for that .shp
file. however, it is possible to append a .shp
file to an existing PostGIS table as explained here: http://spatialmounty.blogspot.com/2015/05/ogr2ogr-append-new-shapefile-to.html. that requires mapping the attributes in the .shp
file to attributes that already exist in the table but it will lead to cumbersome NULLs anyway because shapefiles are not uniform, i.e. don't always have the same metadata fields.
so because of that, again, i suggest we use a table regions(uid=primary key, geom=geometry, meta_data=JSON)
that is able to store heterogeneous shapefiles (i guess that goes with your question @ricardobarroslourenco of whether we need to 'change something prior to ingest'...?). to achieve that i am planning to use ogr2ogr
to first convert a .shp
to a .geojson
and then ingest document by document, also making use of PostGIS' ST_GeomFromGeoJSON
.
let me know if you c a better approach.
As said on monday, for this prototype, it is ok to use ogr2ogr
. But on a further iteration should be interesting to replace into a more customized function, to avoid the generation of blank columns.
how about this schema here?
CREATE TABLE regions_meta (
uid bigserial primary key,
name text, # e.g. GADM, EEZ
version text, # for e.g. GADM: 1, 2.0, 2.7, 2.8 ;
# for e.g. EEZ: 1, 2, 3, ..., 6, 6.1, 7, 8
attributes text[] # for e.g. GADM v2.8: ID_0, NAME_0, VARNAME_0, TYPE_0, ...
# for e.g. EEZ v8: ID, OBJECTID_1, EEZ, Country, Sovereign, Remarks, ...
);
CREATE TABLE regions (
uid bigserial primary key,
geom geometry,
meta_data jsonb, # key-value pairs with keys = attributes from regions_meta
meta_id bigint references regions_meta(uid)
);
to support GADM
, EEZ
(Exclusive Economic Zones Boundaries), and others, e.g. the ones mentioned in http://www.gadm.org/links
i am almost done with writing the ingestion script for shapefiles such as for GADM, EEZ (exclusive economic zones), etc. once we have these tables filled up we can specify polygons by ID which again is convenient when specifying the flask URL.
ok, ingestion into above 2 tables works, just ingested EEZ
(GADM
works too but it just takes too long as for now). now i can continue with the flask routes and within the urls i can conveniently use polygon IDs.
Nice. How long did it take to load? I just arrived at the CI and I'll be reviewing stuff.
took roughly 5' to ingest EEZ
but im still ingesting one polygon at a time, so lots of potential to speed it up. it's all yet still in the severin
branch, not yet merged into develop
. the new tables are, as usual, in models.py
and the shapefile ingestion script is ingest/ingest_shapes.py
.
today i will resume the work on the flask routes in views.py
.
Ok. I'll look into the severin
branch. I'll be also reviewing the architecture we are using as @njmattes asked on monday.
@legendOfZelda about the /ede/cache_builder.py
, is it a cache that gets records in space, but caching all time frames that are present?
hm, haven't looked into cache_builder.py
yet, but we can discuss it today.
@ricardobarroslourenco Once we have a finalized procedure for returning a raster of values, we should finish at least one procedure for aggregating those to polygons. But we don't yet have polygons, I just realized. We need to sort out a way to store these ASAP (this is also mention in issue #9). @legendOfZelda Do you know if plenario has an ingestor for Shapefiles already in place?