Closed njmattes closed 8 years ago
Need to standardize URLs for making requests to the API. Since we're trying to make this RESTful, these need to work over GET
, not POST
, so they can be cached. Hence the parameters in the URLs rather than in the body.
Coordinates for the bounding boxes are clockwise from the north because that's how CSS works so is what I'm used to, but @ricardobarroslourenco maybe there's a more typical way for GIS?
@legendOfZelda In the third URL there is an identifier for the set pf polygons we'll be aggregating to. I assume this is possible given the current data schema but perhaps not? I suppose this is also a question for @ricardobarroslourenco since he's working on aggregation ops.
I can sort out the flask
bits for this if you guys can give me the python bits that do the actual querying.
Again, leaving the first post clean so we can edit.
These three URLs are merely the minimum needed for prototype v0.0. We'll need more shortly. We'll also need to work out headers (eg do we use the cache-control
, expires
, etc. headers), but this can clearly wait until some future time.
@njmattes made some changes to URLs above.
dataset=integer
) but also which variable within that dataset (variable=string
). i thought about using the integer id to specify the variable instead, i.e. variable=integer
. however, after the user issues the first URL he can't know the variable ids, only their names (which appear in the metadata JSON returned for the first URL request)time=date
regions
table doesn't yet have a GADM level
column so we can't yet specify a GADM level
in the request and then in the db get all polygons with that GADM level
. it's a small thing to get done and on Saturday i'd also like to actually improve the ingestion of the GADM regions
so we will have a column GADM level
and the rest of the shapefiles metadata will be in a single JSON column. then we can allow the user to specify the GADM level
in the URL and we can fish out the regions with that level in the regions
table.For the third, I'd rather limit the polygons to only those that intersect the bounding box—in order to keep the size of the request small (there's no point in sending the entire globe in the response, if the user can only see SE Asia).
We'll need a way to specify polygons based on something other than GADM level, since eventually we want other polygon sets. It may be that eventually we need a poly_meta
table.
Also, for time
I think we'll want start_time
and end_time
. And perhaps with should pass 0-indexed time steps, rather than timestamp strings. Strings will have to be URL encoded making them ugly, and they take up unnecessary space. The client should be able to determine the index from the metadata.
If we're retrieving only a single polygon, do the polygons have an id
attribute?
if we use a single table
CREATE TABLE regions (
uid integer PRIMARY KEY,
poly geometry,
meta_data json
);
for the regions we should be fine because into the json field we can put whether it's a GADM
region and if so which level, etc. all kinds of regions should be well-representable with this table, no need for a regions/poly_meta
table IMO. hmm, but maybe you're right because we need to make sure the user can get to the uid
at some point. if we stuff everything into a json blob we're not really using SQL's features, e.g. structure enforcing, etc. but rather use MongoDB-style.
how about two tables instead:
CREATE TABLE regions_meta (
uid integer PRIMARY KEY,
geom geometry
);
CREATE TABLE regions_gadm (
uid integer PRIMARY KEY,
meta_id references regions_meta(uid) unique,
gadm_code integer,
gadm_level integer
);
and similarly for other kinds of regions not coming from GADM
. however, we should still have a meta_data JSON field somewhere because not sure how much we can abstract out into columns...
to handle the mapping from user-friendly timestamps to logical times, i.e. 0-based index i want to use:
CREATE TABLE time_vars (
meta_id references grid_meta(uid) unique,
time timestamp with timezone,
frame integer
);
which also handles non-uniform timesteps which we should because we don't actually know whether timesteps are uniform until we've actually checked the array of the time dimension in the netcdf...
I think the regions meta is a bit backwards. I think regions_meta
should really be something like regions_geom
, and regions_gadm
should be regions_meta
. regions_gadm
is a bit too specific because we'll also have non-GADM regions (someone might prefer regions from Natural Earth, or food producing units, etc.).
You're probably right that we'll need a table to map timesteps, but I think we can ignore it for now, and just return timestep 0 while we get the prototype working.
@njmattes about the orientation of polygon building, OGC into their OpenGIS Implementation Specification for Geographic information - Simple feature access - Part 1: Common architecture, specify that on closed polygons, when representing the inner part, the orientation will be clockwise, and the outer part, counterclockwise.
Just to clarify, when stating bounding box here, is it the browser viewport boundaries? When I've read I thought about three elements:
Considering those, it should be interesting to apply map algebra operations with the polygons, in the sense to produce an 'unified' crop polygon, and just then apply to the dataset. Usually polygons have less points than the raster sets that would be cropped.
About the time not being uniform or being so... I think that would be interesting to someday return to that discussion on data regularization, on time, on space...
(Sorry about the huge delay. Final weeks, you know...)
Hate to bug anyone during spring break, but how are the python methods going for these operations?
Hey @njmattes ! Just came back yesterday from NYC, and today was reading the spatiotemporal paper. It seems to be somehow close in terms of operations we need to implement. My idea is to get that SQL code from Fig. 3, adapt to our data model, and see if it runs accordingly. For the API calls, I'll probably need some help from you and @legendOfZelda specially, to not mess up the system organization.
Hope you had a good time in NY @ricardobarroslourenco! If you and @legendOfZelda can sort out the SQL and get the python started that turns the results into JSON output, I can wire up the API endpoints in flask. We just need something to wire them to ;-)
Interesting that the d2 (tiled & unstacked) method from the paper seems to be the one we're using now. Hopefully that's a good sign.
great, I look forward to seeing it.
A perhaps dumb question, but I saw a demo yesterday of OpenSeaDragon, is this something you have ever considered for the map viewing?
On Mar 25, 2016, at 11:47 AM, Nathan Matteson notifications@github.com<mailto:notifications@github.com> wrote:
Hope you had a good time in NY @ricardobarroslourencohttps://github.com/ricardobarroslourenco! If you and @legendOfZeldahttps://github.com/legendOfZelda can sort out the SQL and get the python started that turns the results into JSON output, I can wire up the API endpoints in flask. We just need something to wire them to ;-)
Interesting that the d2 (tiled & unstacked) method from the paper seems to be the one we're using now. Hopefully that's a good sign.
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHubhttps://github.com/RDCEP/EDE/issues/25#issuecomment-201359272
OpenSeaDragon looks pretty cool. It looks though like it's designed for raster graphics. And while we're using raster data, it's actually being drawn as vector graphics in the browser—so that we can retain access to the stats for each pixel. So each pixel is its own interactive object that the user can access.
Ok, I wondered about that. Thanks.
On Mar 25, 2016, at 12:02 PM, Nathan Matteson notifications@github.com<mailto:notifications@github.com> wrote:
OpenSeaDragon looks pretty cool. It looks though like it's designed for raster graphics. And while we're using raster data, it's actually being drawn as vector graphics in the browser—so that we can retain access to the stats for each pixel. So each pixel is its own interactive object that the user can access.
— You are receiving this because you commented. Reply to this email directly or view it on GitHubhttps://github.com/RDCEP/EDE/issues/25#issuecomment-201364694
ok, im writing up a python script with one method per query we've discussed so far and producing the JSON. once i have that i know the exact input arguments i need and then i can pass that on to you @njmattes so we can finish the specification of the JSON request format. will let you know when i am done with it tonight.
Fantastic. Thanks!
@njmattes it was a nice and refreshing trip :)
I'm working on the SQL part today. I hope that I'll be done with this by today/tomorrow. Then I'll probably move to the integration part.
From Severin: aren't we know working on similar things, me and ricardo?
I don't know. You guys tell me. I thought @ricardobarroslourenco was working on aggregating to the polygons and stats.
Yes, this is what I'm working on (aggregations/stats).
ok. i was planning todo all queries though, i.e. the ones in #23 and in this post, i.e. #25. how about we sort this out through skype @ricardobarroslourenco, then we can also discuss some common issues we might run into...
or we can simply have you take care of some queries and me of others, i.e. i can take care of selection queries and you of aggregation/stats
Because we're at a point where we need any queries, @legendOfZelda why don't you work on selecting rasters within bounding boxes and timesteps between start and end points. @ricardobarroslourenco work with aggregating to polygons within a bounding box. That's all we absolutely need for version 0. But we really need to get those things done.
@njmattes @ricardobarroslourenco i wrote some extraction+json construction methods for each of the queries we've discussed so far, see extract/extract.py
@legendOfZelda it seems to be ok (I've just replaced the /ede/credentials.py
and ran it through pycharm). Right now, does any other module calls /ede/extract/extract.py
?
no other module uses ede/extract/extract.py
yet
Cool. I started a new branch called proto_api
. We shouldn't really code directly on the develop
branch.
Added a method to construct the base JSON response, just to avoid repetition. Perhaps we should use stored procedures in Postgres rather issuing queries on the fly?
Also added a simple try/except to the first request as an example—we should send 500 status codes if a query fails. Eventually probably 400 codes if a record doesn't exist (eg a use requests metadata for anon-existent id
).
Stored procedures should be an interesting optimization. I'm reading through this tutorial. Is it ok for this @njmattes ?
Procedures/methods list and return codes list are good to use, because of exception control and logging.
Yes that looks good. You might also look at the stored_procedure_stubs
branch of the repo, in the sql
folder. I started some aborted procedures a few weeks ago. They likely no longer conform to the current data schema, but they might be useful as some 'starter' code.
The difficulty introduced perhaps by stored procedures is that it may prove slightly more tedious to parse the results into JSON. But only tedious, rather than difficult.
Ok. I'll look into it. Should we merge stored_procedure_stubs
into proto_api
? And then, after work, clean up prior to development
merge? It is a lot of branches right now...
Sure, merging those two is a good idea.
Alright. I've done that. I'll be writing my tests on pgAdmin prior to put them in the repo.
Well. I think that we have an aggregation by mean in space. I just need to check-out if it's proper. @njmattes is coming now in the beginning of the afternoon, and we will review it. It is like this (but passing arguments):
SELECT DISTINCT
(ST_PixelAsCentroids(rast)).x,
(ST_PixelAsCentroids(rast)).y,
AVG((ST_PixelAsCentroids(rast)).val)
OVER (PARTITION BY (ST_PixelAsCentroids(rast)).x, (ST_PixelAsCentroids(rast)).y)
AS "Avg_Val"
FROM public.grid_data
WHERE meta_id = 1 AND var_id = 1 AND
ST_Intersects(rast,(ST_Polygon(ST_GeomFromText('LINESTRING(74 22, 80 22, 80 25,
74 25, 74 22)'), 4326)))
GROUP BY (ST_PixelAsCentroids(rast)).x,
(ST_PixelAsCentroids(rast)).y,
ST_PixelAsCentroids(rast)
I just pushed the code into the repo.
i updated the Flask API above. i don't want to use actual polygons / dates in the URL but instead only work with IDs. that's why for the polygons i'm planning to create the tables mentioned in #22 and for the times:
CREATE TABLE times (
uid bigserial primary key,
meta_id bigint references grid_meta(uid),
date timestamp with time zone
);
so turns out we still need a translation from human-readable dates and machine-readable IDs because dealing with actual dates in the URL is cumbersome, especially if they contain /
i am now working with the additional times
table above and have adjusted the ingestion script accordingly. of course i will also adjust the queries but those should be much more convenient in the longer term now because we don't need date strings in the request URL anymore.
The specification is written up on apiary.io
URL prefix:
/api/v0/
IDs
gridmeta?ids=[list of IDs]
griddata/meta_id/var_id/select?polygons=[list of poly IDs]×=[list of time IDs]
griddata/meta_id/var_id/aggregate/spatial?polygons=[list of poly IDs]×=[list of time IDs]
griddata/meta_id/var_id/aggregate/temporal?polygons=[list of poly IDs]×=[list of time IDs]