Aggregation methods, polygon tables

njmattes commented 8 years ago

@ricardobarroslourenco Once we have a finalized procedure for returning a raster of values, we should finish at least one procedure for aggregating those to polygons. But we don't yet have polygons, I just realized. We need to sort out a way to store these ASAP (this is also mention in issue #9). @legendOfZelda Do you know if plenario has an ingestor for Shapefiles already in place?

ricardobarroslourenco commented 8 years ago

@njmattes there is a shp2pgsql (such as the raster2pgsql) described in a tutorial. I saw that is also possible to get a GeoJSON compliant polygon and ingest. The shapefiles to ingest would be the GADM regions?

njmattes commented 8 years ago

Yes I think we can start with just the GADM0 regions.Unless it's easier to ingest all of them (GADM1, GADM2) at once. Joshua's mentioned FPU regions, but I don't have outlines for those.

ricardobarroslourenco commented 8 years ago

@njmattes we could load them in batches. Specially if something goes wrong, and we need to rollback. Do we have all those shapefiles already downloaded, or we still need to do so at the GADM website?

njmattes commented 8 years ago

I don't have the GADM files anymore (they're huge). You can download the entire world here: http://www.gadm.org/version2.

ricardobarroslourenco commented 8 years ago

@njmattes I've created a folder at /var/www/gadm and downloaded the entire world there (by the way, super fast - 334MB in 14s). It is still zipped, and I would like to hear from @legendOfZelda if he tried to ingest shapefiles on his tests, and if we need to change something prior to ingest.

ghost commented 8 years ago

i ingested the .shp file into a table called regions using ogr2ogr. the table with its columns was created by ogr2ogr itself but we don't really want all these columns (varname_1, nl_name_1, engtype_1, etc). all we need is a primary key, geom, and a JSON with the metadata (that can contain the geometry again, not super efficient in space but convenient).

besides, a new .shp file might have other attributes than those, forcing us to let ogr2ogr create a new table for that .shp file. however, it is possible to append a .shp file to an existing PostGIS table as explained here: http://spatialmounty.blogspot.com/2015/05/ogr2ogr-append-new-shapefile-to.html. that requires mapping the attributes in the .shp file to attributes that already exist in the table but it will lead to cumbersome NULLs anyway because shapefiles are not uniform, i.e. don't always have the same metadata fields.

so because of that, again, i suggest we use a table regions(uid=primary key, geom=geometry, meta_data=JSON) that is able to store heterogeneous shapefiles (i guess that goes with your question @ricardobarroslourenco of whether we need to 'change something prior to ingest'...?). to achieve that i am planning to use ogr2ogr to first convert a .shp to a .geojson and then ingest document by document, also making use of PostGIS' ST_GeomFromGeoJSON.

let me know if you c a better approach.

ricardobarroslourenco commented 8 years ago

As said on monday, for this prototype, it is ok to use ogr2ogr. But on a further iteration should be interesting to replace into a more customized function, to avoid the generation of blank columns.

ghost commented 8 years ago

how about this schema here?

CREATE TABLE regions_meta (
    uid         bigserial primary key,
    name        text,   # e.g. GADM, EEZ
    version     text,   # for e.g. GADM: 1, 2.0, 2.7, 2.8 ;
                        # for e.g. EEZ: 1, 2, 3, ..., 6, 6.1, 7, 8
    attributes  text[]  # for e.g. GADM v2.8:   ID_0, NAME_0, VARNAME_0, TYPE_0, ...
                        # for e.g. EEZ  v8:     ID, OBJECTID_1, EEZ, Country, Sovereign, Remarks, ...
);

CREATE TABLE regions (
    uid         bigserial primary key,
    geom        geometry,
    meta_data   jsonb,  # key-value pairs with keys = attributes from regions_meta
    meta_id     bigint references regions_meta(uid)
);

to support GADM, EEZ (Exclusive Economic Zones Boundaries), and others, e.g. the ones mentioned in http://www.gadm.org/links

ghost commented 8 years ago

i am almost done with writing the ingestion script for shapefiles such as for GADM, EEZ (exclusive economic zones), etc. once we have these tables filled up we can specify polygons by ID which again is convenient when specifying the flask URL.

ghost commented 8 years ago

ok, ingestion into above 2 tables works, just ingested EEZ (GADM works too but it just takes too long as for now). now i can continue with the flask routes and within the urls i can conveniently use polygon IDs.

ricardobarroslourenco commented 8 years ago

Nice. How long did it take to load? I just arrived at the CI and I'll be reviewing stuff.

ghost commented 8 years ago

took roughly 5' to ingest EEZ but im still ingesting one polygon at a time, so lots of potential to speed it up. it's all yet still in the severin branch, not yet merged into develop. the new tables are, as usual, in models.py and the shapefile ingestion script is ingest/ingest_shapes.py.

today i will resume the work on the flask routes in views.py.

ricardobarroslourenco commented 8 years ago

Ok. I'll look into the severin branch. I'll be also reviewing the architecture we are using as @njmattes asked on monday.

ricardobarroslourenco commented 8 years ago

@legendOfZelda about the /ede/cache_builder.py, is it a cache that gets records in space, but caching all time frames that are present?

ghost commented 8 years ago

hm, haven't looked into cache_builder.py yet, but we can discuss it today.

RDCEP / EDE

Aggregation methods, polygon tables #22