Standard for API requests

njmattes commented 8 years ago

URL prefix: /api/v0/

return metadata of gridded datasets corresponding to IDs gridmeta?ids=[list of IDs]
return raster values within specified polygons & times griddata/meta_id/var_id/select?polygons=[list of poly IDs]&times=[list of time IDs]
aggregate spatially within specified polygons & times griddata/meta_id/var_id/aggregate/spatial?polygons=[list of poly IDs]&times=[list of time IDs]
aggregate temporally within specified polygons & times griddata/meta_id/var_id/aggregate/temporal?polygons=[list of poly IDs]&times=[list of time IDs]

njmattes commented 8 years ago

Need to standardize URLs for making requests to the API. Since we're trying to make this RESTful, these need to work over GET, not POST, so they can be cached. Hence the parameters in the URLs rather than in the body.

Coordinates for the bounding boxes are clockwise from the north because that's how CSS works so is what I'm used to, but @ricardobarroslourenco maybe there's a more typical way for GIS?

@legendOfZelda In the third URL there is an identifier for the set pf polygons we'll be aggregating to. I assume this is possible given the current data schema but perhaps not? I suppose this is also a question for @ricardobarroslourenco since he's working on aggregation ops.

I can sort out the flask bits for this if you guys can give me the python bits that do the actual querying.

Again, leaving the first post clean so we can edit.

These three URLs are merely the minimum needed for prototype v0.0. We'll need more shortly. We'll also need to work out headers (eg do we use the cache-control, expires, etc. headers), but this can clearly wait until some future time.

ghost commented 8 years ago

@njmattes made some changes to URLs above.

we not only need to specify which dataset (through dataset=integer) but also which variable within that dataset (variable=string). i thought about using the integer id to specify the variable instead, i.e. variable=integer. however, after the user issues the first URL he can't know the variable ids, only their names (which appear in the metadata JSON returned for the first URL request)
for the time being i'd say let's focus on requests for fixed times, so we also need time=date
for the last query we shouldn't be specifying the aggregation polygons and at the same time the bounding box, just the aggregation polygons, because it's just those that matter. and for the time being let's just take a single polygon because the regions table doesn't yet have a GADM level column so we can't yet specify a GADM level in the request and then in the db get all polygons with that GADM level. it's a small thing to get done and on Saturday i'd also like to actually improve the ingestion of the GADM regions so we will have a column GADM level and the rest of the shapefiles metadata will be in a single JSON column. then we can allow the user to specify the GADM level in the URL and we can fish out the regions with that level in the regions table.

njmattes commented 8 years ago

For the third, I'd rather limit the polygons to only those that intersect the bounding box—in order to keep the size of the request small (there's no point in sending the entire globe in the response, if the user can only see SE Asia).

We'll need a way to specify polygons based on something other than GADM level, since eventually we want other polygon sets. It may be that eventually we need a poly_meta table.

njmattes commented 8 years ago

Also, for time I think we'll want start_time and end_time. And perhaps with should pass 0-indexed time steps, rather than timestamp strings. Strings will have to be URL encoded making them ugly, and they take up unnecessary space. The client should be able to determine the index from the metadata.

njmattes commented 8 years ago

If we're retrieving only a single polygon, do the polygons have an id attribute?

ghost commented 8 years ago

if we use a single table

CREATE TABLE regions (
    uid         integer PRIMARY KEY,
    poly        geometry,
    meta_data   json
);

for the regions we should be fine because into the json field we can put whether it's a GADM region and if so which level, etc. all kinds of regions should be well-representable with this table, no need for a regions/poly_meta table IMO. hmm, but maybe you're right because we need to make sure the user can get to the uid at some point. if we stuff everything into a json blob we're not really using SQL's features, e.g. structure enforcing, etc. but rather use MongoDB-style.

how about two tables instead:

CREATE TABLE regions_meta (
    uid        integer PRIMARY KEY,
    geom       geometry
);

CREATE TABLE regions_gadm (
    uid             integer PRIMARY KEY,
    meta_id         references regions_meta(uid) unique,
    gadm_code       integer,
    gadm_level      integer
);

and similarly for other kinds of regions not coming from GADM. however, we should still have a meta_data JSON field somewhere because not sure how much we can abstract out into columns...

ghost commented 8 years ago

to handle the mapping from user-friendly timestamps to logical times, i.e. 0-based index i want to use:

CREATE TABLE time_vars (
    meta_id         references grid_meta(uid) unique,
    time            timestamp with timezone,
    frame           integer
);

which also handles non-uniform timesteps which we should because we don't actually know whether timesteps are uniform until we've actually checked the array of the time dimension in the netcdf...

njmattes commented 8 years ago

I think the regions meta is a bit backwards. I think regions_meta should really be something like regions_geom, and regions_gadm should be regions_meta. regions_gadm is a bit too specific because we'll also have non-GADM regions (someone might prefer regions from Natural Earth, or food producing units, etc.).

You're probably right that we'll need a table to map timesteps, but I think we can ignore it for now, and just return timestep 0 while we get the prototype working.

ricardobarroslourenco commented 8 years ago

@njmattes about the orientation of polygon building, OGC into their OpenGIS Implementation Specification for Geographic information - Simple feature access - Part 1: Common architecture, specify that on closed polygons, when representing the inner part, the orientation will be clockwise, and the outer part, counterclockwise.

Just to clarify, when stating bounding box here, is it the browser viewport boundaries? When I've read I thought about three elements:

Bounding box (viewport polygon?)
Polygon (Ex. GADM Region)
Point Data

Considering those, it should be interesting to apply map algebra operations with the polygons, in the sense to produce an 'unified' crop polygon, and just then apply to the dataset. Usually polygons have less points than the raster sets that would be cropped.

About the time not being uniform or being so... I think that would be interesting to someday return to that discussion on data regularization, on time, on space...

(Sorry about the huge delay. Final weeks, you know...)

njmattes commented 8 years ago

Hate to bug anyone during spring break, but how are the python methods going for these operations?

ricardobarroslourenco commented 8 years ago

Hey @njmattes ! Just came back yesterday from NYC, and today was reading the spatiotemporal paper. It seems to be somehow close in terms of operations we need to implement. My idea is to get that SQL code from Fig. 3, adapt to our data model, and see if it runs accordingly. For the API calls, I'll probably need some help from you and @legendOfZelda specially, to not mess up the system organization.

njmattes commented 8 years ago

Hope you had a good time in NY @ricardobarroslourenco! If you and @legendOfZelda can sort out the SQL and get the python started that turns the results into JSON output, I can wire up the API endpoints in flask. We just need something to wire them to ;-)

Interesting that the d2 (tiled & unstacked) method from the paper seems to be the one we're using now. Hopefully that's a good sign.

ianfoster commented 8 years ago

great, I look forward to seeing it.

A perhaps dumb question, but I saw a demo yesterday of OpenSeaDragon, is this something you have ever considered for the map viewing?

On Mar 25, 2016, at 11:47 AM, Nathan Matteson notifications@github.com<mailto:notifications@github.com> wrote:

Hope you had a good time in NY @ricardobarroslourencohttps://github.com/ricardobarroslourenco! If you and @legendOfZeldahttps://github.com/legendOfZelda can sort out the SQL and get the python started that turns the results into JSON output, I can wire up the API endpoints in flask. We just need something to wire them to ;-)

Interesting that the d2 (tiled & unstacked) method from the paper seems to be the one we're using now. Hopefully that's a good sign.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHubhttps://github.com/RDCEP/EDE/issues/25#issuecomment-201359272

njmattes commented 8 years ago

OpenSeaDragon looks pretty cool. It looks though like it's designed for raster graphics. And while we're using raster data, it's actually being drawn as vector graphics in the browser—so that we can retain access to the stats for each pixel. So each pixel is its own interactive object that the user can access.

ianfoster commented 8 years ago

Ok, I wondered about that. Thanks.

On Mar 25, 2016, at 12:02 PM, Nathan Matteson notifications@github.com<mailto:notifications@github.com> wrote:

OpenSeaDragon looks pretty cool. It looks though like it's designed for raster graphics. And while we're using raster data, it's actually being drawn as vector graphics in the browser—so that we can retain access to the stats for each pixel. So each pixel is its own interactive object that the user can access.

— You are receiving this because you commented. Reply to this email directly or view it on GitHubhttps://github.com/RDCEP/EDE/issues/25#issuecomment-201364694

ghost commented 8 years ago

ok, im writing up a python script with one method per query we've discussed so far and producing the JSON. once i have that i know the exact input arguments i need and then i can pass that on to you @njmattes so we can finish the specification of the JSON request format. will let you know when i am done with it tonight.

njmattes commented 8 years ago

Fantastic. Thanks!

ricardobarroslourenco commented 8 years ago

@njmattes it was a nice and refreshing trip :)

I'm working on the SQL part today. I hope that I'll be done with this by today/tomorrow. Then I'll probably move to the integration part.

From Severin: aren't we know working on similar things, me and ricardo?

njmattes commented 8 years ago

I don't know. You guys tell me. I thought @ricardobarroslourenco was working on aggregating to the polygons and stats.

ricardobarroslourenco commented 8 years ago

Yes, this is what I'm working on (aggregations/stats).

ghost commented 8 years ago

ok. i was planning todo all queries though, i.e. the ones in #23 and in this post, i.e. #25. how about we sort this out through skype @ricardobarroslourenco, then we can also discuss some common issues we might run into...

or we can simply have you take care of some queries and me of others, i.e. i can take care of selection queries and you of aggregation/stats

njmattes commented 8 years ago

Because we're at a point where we need any queries, @legendOfZelda why don't you work on selecting rasters within bounding boxes and timesteps between start and end points. @ricardobarroslourenco work with aggregating to polygons within a bounding box. That's all we absolutely need for version 0. But we really need to get those things done.

ghost commented 8 years ago

@njmattes @ricardobarroslourenco i wrote some extraction+json construction methods for each of the queries we've discussed so far, see extract/extract.py

ricardobarroslourenco commented 8 years ago

@legendOfZelda it seems to be ok (I've just replaced the /ede/credentials.py and ran it through pycharm). Right now, does any other module calls /ede/extract/extract.py ?

ghost commented 8 years ago

no other module uses ede/extract/extract.py yet

njmattes commented 8 years ago

Cool. I started a new branch called proto_api. We shouldn't really code directly on the develop branch.

Added a method to construct the base JSON response, just to avoid repetition. Perhaps we should use stored procedures in Postgres rather issuing queries on the fly?

Also added a simple try/except to the first request as an example—we should send 500 status codes if a query fails. Eventually probably 400 codes if a record doesn't exist (eg a use requests metadata for anon-existent id).

ricardobarroslourenco commented 8 years ago

Stored procedures should be an interesting optimization. I'm reading through this tutorial. Is it ok for this @njmattes ?

Procedures/methods list and return codes list are good to use, because of exception control and logging.

njmattes commented 8 years ago

Yes that looks good. You might also look at the stored_procedure_stubs branch of the repo, in the sql folder. I started some aborted procedures a few weeks ago. They likely no longer conform to the current data schema, but they might be useful as some 'starter' code.

The difficulty introduced perhaps by stored procedures is that it may prove slightly more tedious to parse the results into JSON. But only tedious, rather than difficult.

ricardobarroslourenco commented 8 years ago

Ok. I'll look into it. Should we merge stored_procedure_stubs into proto_api ? And then, after work, clean up prior to development merge? It is a lot of branches right now...

njmattes commented 8 years ago

Sure, merging those two is a good idea.

ricardobarroslourenco commented 8 years ago

Alright. I've done that. I'll be writing my tests on pgAdmin prior to put them in the repo.

ricardobarroslourenco commented 8 years ago

Well. I think that we have an aggregation by mean in space. I just need to check-out if it's proper. @njmattes is coming now in the beginning of the afternoon, and we will review it. It is like this (but passing arguments):

SELECT DISTINCT

  (ST_PixelAsCentroids(rast)).x,
  (ST_PixelAsCentroids(rast)).y,
  AVG((ST_PixelAsCentroids(rast)).val) 
  OVER (PARTITION BY (ST_PixelAsCentroids(rast)).x, (ST_PixelAsCentroids(rast)).y)
  AS "Avg_Val"

            FROM public.grid_data
            WHERE meta_id = 1 AND var_id = 1 AND
                  ST_Intersects(rast,(ST_Polygon(ST_GeomFromText('LINESTRING(74 22, 80 22, 80 25, 
                                             74 25, 74 22)'), 4326)))

            GROUP BY (ST_PixelAsCentroids(rast)).x,
                 (ST_PixelAsCentroids(rast)).y,
                 ST_PixelAsCentroids(rast)

I just pushed the code into the repo.

ghost commented 8 years ago

i updated the Flask API above. i don't want to use actual polygons / dates in the URL but instead only work with IDs. that's why for the polygons i'm planning to create the tables mentioned in #22 and for the times:

CREATE TABLE times (
    uid         bigserial primary key,
    meta_id     bigint references grid_meta(uid),
    date        timestamp with time zone
);

so turns out we still need a translation from human-readable dates and machine-readable IDs because dealing with actual dates in the URL is cumbersome, especially if they contain /

ghost commented 8 years ago

i am now working with the additional times table above and have adjusted the ingestion script accordingly. of course i will also adjust the queries but those should be much more convenient in the longer term now because we don't need date strings in the request URL anymore.

ghost commented 8 years ago

The specification is written up on apiary.io

RDCEP / EDE

Standard for API requests #25