BETY API continuing development: add v0 GET endpoints

max-zilla commented 8 years ago

Notes from 1/20/16 meeting with @dlebauer @gsrohde @robkooper around enhancing BETYdb API.

We plan to introduce /controllers/api/ directory to BETY project and will start developing in controllers/api/v0. This will help delineate API components from GUI/other components.

First step is to add the following end points:

[x] GET /species, /sites, /users - return a simple list of those entries
[ ] GET /traits - support filter by site name, geometric bounding box, cultivars.name, species.name, start & end time, limit + offset for paging support, other filters down the road

After finishing this task, we will review and make new issues for

addtional GET end points
POST end points, starting with /traits - this will support vector of multiple traits in one call and allow provision of parameters similar to what we'll see from the GET method on this URL e.g. cultivar;trait;variables;time;date;citation; etc.

max-zilla commented 8 years ago

@gsrohde please let me know when you get the chance to commit a branch with your initial code for the GET users functionality - i've looked at the search controller before but this will be helpful as well.

robkooper commented 8 years ago

Some quick additions:

To get data from a REST api, use GET method. To add data use POST method, to change data use PUT method and to delete an entry use DELETE method.
All endpoints should be plurals, so /api/sites should return the sites. This should always return an ID as well as some minimal information. To get more information use GET /api/sites/{ID} which will return all information about a site.
To use optional filters use URL arguments, such as /api/sites?city=Urbana to return all sites in the city Urbana.

gsrohde commented 8 years ago

@max-zilla I have some code ready for you to look at. It's on branch draft_apis (see https://github.com/PecanProject/bety/tree/draft_apis). It's also deployed to pecandev/beta; here are some sample queries:

http://pecandev.igb.illinois.edu/beta/api/e1/species?genus=Acer

http://pecandev.igb.illinois.edu/beta/api/e1/sites/1202

http://pecandev.igb.illinois.edu/beta/api/e2/sites?limit=4 (this demonstrates a nice way to get associated data without too much repetition of information)

http://pecandev.igb.illinois.edu/beta/api/e2/species?limit=10&offset=1000

http://pecandev.igb.illinois.edu/beta/api/e2/species?limit=10&offset=1000&genus=Matelea

http://pecandev.igb.illinois.edu/beta/api/e3/sites?limit=4

http://pecandev.igb.illinois.edu/beta/api/e4/sites?limit=4

Note these are all queries—I haven't gotten to the PUT and POST APIs. I've mainly been experimenting with some new Gems and with separating the APIs out from the browser-support machinery.

I used e1, e2, etc. for different versions ("e" for "example"), though in production we will probably use v1, v2, etc.

A few notes (and feel free to ask questions):

e1, e2 use the ActiveModel::Serializers Gem and the code under app/serializers.

e3 uses the JBuilder Gem for JSON templating.

e4 use Rabl for JSON templating.

You can of course do git diff master to get a quick picture of all the files I added and the few I changed.

ActiveModel::Serializers has some nice features, but the latest stable version (0.9) is significantly different from the release candidate (0.10), and using templates instead of serializers is arguably more "MVC". So I think I'm leaning toward Rabl but haven't really decided.

Some big to-dos:

Implement whatever authentication and authorization controls we decide we need.
Implement changing the database via APIs.
Decide exactly what form the JSON results should have (what attributes, what associations, what metadata, how it should be nested, whether to include root elements, etc.).
Implement error messaging, limits on result size, etc.
Perhaps allow for more nuanced filters (queries)—using <, >, LIKE, ~, IN, for example.

dlebauer commented 8 years ago

I asked @sckott for advice on the rOpenSci discussion forum. (rOpenSci develop lots of R packages to get public data from APIs, including the traits package that uses the BETYdb API ...

Its worth reading the discussion, but here a few key ideas to put on everyones radar:

Fail well
use the appropriate HTTP status codes, e.g., when someone tries a POST request against a route that only allows GET, then a 405 - Status Not Allowed is appropriate
if you have error messages, put those in JSON response body, not a html-ized stack trace thing
Use gzip compression to make data sent over the wire smaller (maybe there's better compression out there, not sure)
If you allow geometry searches, WKT strings can get long fast, and you can run up against 414 HTTP errors, so allowing a POST request is good in those cases

dlebauer commented 8 years ago

@gsrohde

Implement whatever authentication and authorization controls we decide we need.

will the existing key=some/random/string/of/digits/and/letters suffice here? Or do we need something that takes authentication from Clowder as well?

Implement changing the database via APIs.

After we draft the GET endpoints we can start on the POST methods as a separate issue?

Decide exactly what form the JSON results should have (what attributes, what associations, what metadata, how it should be nested, whether to include root elements, etc.).

Implement error messaging, limits on result size, etc.

Yes. Please do this as you go ... lets start with a 5000 record limit by default.

Perhaps allow for more nuanced filters (queries)—using <, >, LIKE, ~, IN, for example.

This can wait and be put on the 'nice to have' shelf. First priority is 'between' for times and 'inside bounding box' for geometries.

dlebauer commented 8 years ago

@gsrohde if it makes sense, lets start with v0 to be consistent with our milestones (and to clearly communicate to users that this is still a draft awaiting feedback)

robkooper commented 8 years ago

Use either the key, or use basic auth for authentication.

dlebauer commented 8 years ago

Scott provided these samples for review (hosted on pecandev.igb.illinois.edu/beta/):

/api/v0/citations?id=19 /api/v0/citations/19 /api/v0/covariates?id=5156 /api/v0/covariates/5156 /api/v0/cultivars?id=55 /api/v0/cultivars/55 /api/v0/dbfiles?id=2 /api/v0/dbfiles/2 /api/v0/ensembles?id=263 /api/v0/ensembles/263 /api/v0/entities?id=1 /api/v0/entities/1 /api/v0/formats?id=19 /api/v0/formats/19 /api/v0/inputs?id=7 /api/v0/inputs/7 # DOESN'T WORK! Need ":foreign_key => 'parent_id'" option on "has_many :children" specification /api/v0/machines?id=12 /api/v0/machines/12 /api/v0/managements?id=9 /api/v0/managements/9 /api/v0/methods?id=7 # DOESN'T WORK! Need ":foreign_key => 'method_id'" option on "has_many :traits" and "has_many :yields" specifications /api/v0/methods/7 #DITTO /api/v0/mimetypes?id=1090 /api/v0/mimetypes/1090 /api/v0/models?id=12 /api/v0/models/12 /api/v0/modeltypes?id=2 /api/v0/modeltypes/2 /api/v0/pfts?id=62 /api/v0/pfts/62 /api/v0/posteriors?id=523 # associated ensemble ids = true, not an array of ids! /api/v0/posteriors/523 # DOESN'T WORK /api/v0/priors?id=40 /api/v0/priors/40 /api/v0/runs?id=30332 /api/v0/runs/30332 /api/v0/search?id=111 # Note id isn't unique; ALSO, NO RAILS ASSOCIATIONS! /api/v0/search/111 # DOESN'T WORK! /api/v0/sites?id=1268 /api/v0/sites?id=57 /api/v0/sites/57 /api/v0/species?id=2480 /api/v0/species/2480 /api/v0/traits?id=10272 # NOTE TWO COVARIATES WITH SAME VARIABLE AND DIFFERENT LEVELS! /api/v0/traits/10272 /api/v0/treatments?id=2529 /api/v0/treatments?id=2529 /api/v0/users?id=167 /api/v0/users/167 /api/v0/variables?id=551 /api/v0/variables/551 /api/v0/variables/298 # Nest formats in formats_variables? /api/v0/yields?id=1 /api/v0/yields/1

dlebauer commented 8 years ago

@gsrohde Its great to have these samples, but the focus should be on returning a useful array from traits very similar to the traits_and_yields_view table. Any foreing keys in traits_and_yields_view should also be accessible, e.g. variables, sites, cultivars, treatments, managements, methods, and covariates.

Once those queries work, just getting the flat table out and not worrying about complex joins, please document them and move to the POST/PUT endpoints.

Although the search fails because id's are not unique, I am not sure I see a use case for searching by id.

sckott commented 8 years ago

@dlebauer did you mean to ping me for feedback/etc. ?

dlebauer commented 8 years ago

Not yet... I think it's b/c I mentioned you above but thanks for checking! On Thu, Feb 4, 2016 at 5:46 PM Scott Chamberlain notifications@github.com wrote:

@dlebauer https://github.com/dlebauer did you mean to ping me for feedback/etc. ?

— Reply to this email directly or view it on GitHub https://github.com/PecanProject/bety/issues/381#issuecomment-180107344.

gsrohde commented 8 years ago

@dlebauer The pecandev.igb.illinois.edu/beta/api/v0/search?id=111 search does work—it's only pecandev.igb.illinois.edu/beta/api/v0/search/111 that doesn't.

I didn't show examples of querying by any column value other than id, but in the current implementation, you can query by any column. For example, you can do

http://pecandev.igb.illinois.edu/beta/api/v0/search?commonname=switchgrass&result_type=traits&citation_year=1996&city=DuQuoin

What else can go in the query string besides column names? Currently, both limit=nnn and offset=nnn are supported.

I've started working on the POST/PUT endpoints, but I could go back and document the GET endpoints more comprehensively. What I've just written is pretty much all you need to know though, and I had intended to wait on anything more formal until we're more sure we have things the way we want them.

gsrohde commented 8 years ago

I have some insertion API code working and deployed at http://pecandev.igb.illinois.edu/beta. I will post some how-to's for loading sample data.

gsrohde commented 8 years ago

Here are some more sample queries that reflect recent changes and extensions to the v0 API. These include authentication, fuzzy-matching (which is really PostgreSQL RegExp matching), the option to limit the number of results returned, and the option for XML responses.

Find sites whose sitename includes the string "Lab": http://pecandev.igb.illinois.edu/beta/api/v0/sites?sitename=~Lab&key=9999999999999999999999999999999999999999
Get the results in XML format: http://pecandev.igb.illinois.edu/beta/api/v0/sites.xml?sitename=~Lab&key=9999999999999999999999999999999999999999
These searches are case-sensitive (should they be?), so we may want to search on "lab" as well: http://pecandev.igb.illinois.edu/beta/api/v0/sites?sitename=~lab&key=9999999999999999999999999999999999999999
Since we can use any valid PostgreSQL regular expression, we could get the combined results of these queries with this: http://pecandev.igb.illinois.edu/beta/api/v0/sites?sitename=~[Ll]ab&key=9999999999999999999999999999999999999999 (But if there were sitenames that had LABORATORY in all caps, we'd have to use a regular expression like "[Ll][Aa][Bb]".) Note that some symbols that can be used in PostgreSQL regular expressions may need to be URL-encoded in order to be passed in the query string.
Get up to 20 rows of yield data from the traits_and_yields_view view from sites whose name includes the string "Lab" or "lab": http://pecandev.igb.illinois.edu/beta/api/v0/search?sitename=~[Ll]ab&key=9999999999999999999999999999999999999999&limit=20&result_type=yields
Get 20 more (different) rows using the same search criteria: http://pecandev.igb.illinois.edu/beta/api/v0/search?sitename=~[Ll]ab&key=9999999999999999999999999999999999999999&limit=20&offset=20&result_type=yields

Note that I'm using the string =~ as the fuzzy-match operator even though in PostgreSQL, regular expression matches are done with just ~. This is mainly to make it easier to parse the query string. Also note that if you are running these examples in the browser and are logged into BETYdb, then you don't need to include the key parameter in the query string.

dlebauer commented 8 years ago

This looks great. One thing I notice is that the 'edit url' points to the same host as the api call. It should probably go to the instance that 'owns' the id's being edited (e.g. betydb.org where id<1 billion).

dlebauer commented 8 years ago

I don't think that the way the limit works is useful.

This is what I see:

If there are > 200 records and no limit is set, an error is returned
If there are > 200 records and a limit higher than 200 is set, the same error is returned
the message 'try a more restrictive search' isn't very useful

I'd prefer to see:

Either
- return all records if limit is not set
- for example this call: managements?mgmttype=~fertil would return all 1576 records
OR
- only return 200 records + metadata that states the total # of records and the # returned e.g. metadata: {"count": 1576, "returned": 200}, "warning": "not all records returned", data:{`
- AND a special case like limit=all would be helpful.
If limit is set to nnn > 200, the response should have the first nnn records
- for example, this call: managements?mgmttype=~fertil&limit=5000 would return 1576 records

dlebauer commented 8 years ago

I would suggest that using the =~ for fuzzy matching should be case insensitive.

Are there any reasons not to make all URL based queries case insensitive?

gsrohde commented 8 years ago

I've made these changes:

An explicit limit now overrides the default 200-row limit. limit=all is now supported.
The result count is shown in the metadata.
When no explicit limit is set and there are more than 200 results, a warning is issued stating that only 200 results are being shown.
The "show" URL of each result is shown rather than the edit URL.
Fuzzy matching is now case-insensitive.

dlebauer commented 8 years ago

@gsrohde can this issue be closed? Are the features from the original post implemented, and do we need to make any new sub issues for additional GET and POST endpoints?

Has all of the new functionality been documented?

gsrohde commented 8 years ago

Remaining API work is in https://github.com/terraref/computing-pipeline/issues/124 so I'm closing this. The additional trait filters item is not implemented. If you still think it's important, add it to that issue. Or if you think the GET API piece really should be a PecanProject/bety issue, let me know and I'll make a new issue and move all the GET-related tasks in issue 124 there.

PecanProject / bety

BETY API continuing development: add v0 GET endpoints #381