PecanProject / bety

Web-interface to the Biofuel Ecophysiological Traits and Yields Database (used by PEcAn and TERRA REF)
https://www.betydb.org
BSD 3-Clause "New" or "Revised" License
16 stars 38 forks source link

BETY API continuing development: add v0 GET endpoints #381

Closed max-zilla closed 8 years ago

max-zilla commented 8 years ago

Notes from 1/20/16 meeting with @dlebauer @gsrohde @robkooper around enhancing BETYdb API.

We plan to introduce /controllers/api/ directory to BETY project and will start developing in controllers/api/v0. This will help delineate API components from GUI/other components.

First step is to add the following end points:

After finishing this task, we will review and make new issues for

max-zilla commented 8 years ago

@gsrohde please let me know when you get the chance to commit a branch with your initial code for the GET users functionality - i've looked at the search controller before but this will be helpful as well.

robkooper commented 8 years ago

Some quick additions:

gsrohde commented 8 years ago

@max-zilla I have some code ready for you to look at. It's on branch draft_apis (see https://github.com/PecanProject/bety/tree/draft_apis). It's also deployed to pecandev/beta; here are some sample queries:

http://pecandev.igb.illinois.edu/beta/api/e1/species?genus=Acer

http://pecandev.igb.illinois.edu/beta/api/e1/sites/1202

http://pecandev.igb.illinois.edu/beta/api/e2/sites?limit=4 (this demonstrates a nice way to get associated data without too much repetition of information)

http://pecandev.igb.illinois.edu/beta/api/e2/species?limit=10&offset=1000

http://pecandev.igb.illinois.edu/beta/api/e2/species?limit=10&offset=1000&genus=Matelea

http://pecandev.igb.illinois.edu/beta/api/e3/sites?limit=4

http://pecandev.igb.illinois.edu/beta/api/e4/sites?limit=4

Note these are all queries—I haven't gotten to the PUT and POST APIs. I've mainly been experimenting with some new Gems and with separating the APIs out from the browser-support machinery.

I used e1, e2, etc. for different versions ("e" for "example"), though in production we will probably use v1, v2, etc.

A few notes (and feel free to ask questions):

e1, e2 use the ActiveModel::Serializers Gem and the code under app/serializers.

e3 uses the JBuilder Gem for JSON templating.

e4 use Rabl for JSON templating.

You can of course do git diff master to get a quick picture of all the files I added and the few I changed.

ActiveModel::Serializers has some nice features, but the latest stable version (0.9) is significantly different from the release candidate (0.10), and using templates instead of serializers is arguably more "MVC". So I think I'm leaning toward Rabl but haven't really decided.

Some big to-dos:

  1. Implement whatever authentication and authorization controls we decide we need.
  2. Implement changing the database via APIs.
  3. Decide exactly what form the JSON results should have (what attributes, what associations, what metadata, how it should be nested, whether to include root elements, etc.).
  4. Implement error messaging, limits on result size, etc.
  5. Perhaps allow for more nuanced filters (queries)—using <, >, LIKE, ~, IN, for example.
dlebauer commented 8 years ago

I asked @sckott for advice on the rOpenSci discussion forum. (rOpenSci develop lots of R packages to get public data from APIs, including the traits package that uses the BETYdb API ...

Its worth reading the discussion, but here a few key ideas to put on everyones radar:

dlebauer commented 8 years ago

@gsrohde

Implement whatever authentication and authorization controls we decide we need.

will the existing key=some/random/string/of/digits/and/letters suffice here? Or do we need something that takes authentication from Clowder as well?

Implement changing the database via APIs.

After we draft the GET endpoints we can start on the POST methods as a separate issue?

Decide exactly what form the JSON results should have (what attributes, what associations, what metadata, how it should be nested, whether to include root elements, etc.).

Implement error messaging, limits on result size, etc.

Yes. Please do this as you go ... lets start with a 5000 record limit by default.

Perhaps allow for more nuanced filters (queries)—using <, >, LIKE, ~, IN, for example.

This can wait and be put on the 'nice to have' shelf. First priority is 'between' for times and 'inside bounding box' for geometries.

dlebauer commented 8 years ago

@gsrohde if it makes sense, lets start with v0 to be consistent with our milestones (and to clearly communicate to users that this is still a draft awaiting feedback)

robkooper commented 8 years ago

Use either the key, or use basic auth for authentication.

dlebauer commented 8 years ago

Scott provided these samples for review (hosted on pecandev.igb.illinois.edu/beta/):

/api/v0/citations?id=19 /api/v0/citations/19 /api/v0/covariates?id=5156 /api/v0/covariates/5156 /api/v0/cultivars?id=55 /api/v0/cultivars/55 /api/v0/dbfiles?id=2 /api/v0/dbfiles/2 /api/v0/ensembles?id=263 /api/v0/ensembles/263 /api/v0/entities?id=1 /api/v0/entities/1 /api/v0/formats?id=19 /api/v0/formats/19 /api/v0/inputs?id=7 /api/v0/inputs/7 # DOESN'T WORK! Need ":foreign_key => 'parent_id'" option on "has_many :children" specification /api/v0/machines?id=12 /api/v0/machines/12 /api/v0/managements?id=9 /api/v0/managements/9 /api/v0/methods?id=7 # DOESN'T WORK! Need ":foreign_key => 'method_id'" option on "has_many :traits" and "has_many :yields" specifications /api/v0/methods/7 #DITTO /api/v0/mimetypes?id=1090 /api/v0/mimetypes/1090 /api/v0/models?id=12 /api/v0/models/12 /api/v0/modeltypes?id=2 /api/v0/modeltypes/2 /api/v0/pfts?id=62 /api/v0/pfts/62 /api/v0/posteriors?id=523 # associated ensemble ids = true, not an array of ids! /api/v0/posteriors/523 # DOESN'T WORK /api/v0/priors?id=40 /api/v0/priors/40 /api/v0/runs?id=30332 /api/v0/runs/30332 /api/v0/search?id=111 # Note id isn't unique; ALSO, NO RAILS ASSOCIATIONS! /api/v0/search/111 # DOESN'T WORK! /api/v0/sites?id=1268 /api/v0/sites?id=57 /api/v0/sites/57 /api/v0/species?id=2480 /api/v0/species/2480 /api/v0/traits?id=10272 # NOTE TWO COVARIATES WITH SAME VARIABLE AND DIFFERENT LEVELS! /api/v0/traits/10272 /api/v0/treatments?id=2529 /api/v0/treatments?id=2529 /api/v0/users?id=167 /api/v0/users/167 /api/v0/variables?id=551 /api/v0/variables/551 /api/v0/variables/298 # Nest formats in formats_variables? /api/v0/yields?id=1 /api/v0/yields/1

dlebauer commented 8 years ago

@gsrohde Its great to have these samples, but the focus should be on returning a useful array from traits very similar to the traits_and_yields_view table. Any foreing keys in traits_and_yields_view should also be accessible, e.g. variables, sites, cultivars, treatments, managements, methods, and covariates.

Once those queries work, just getting the flat table out and not worrying about complex joins, please document them and move to the POST/PUT endpoints.

Although the search fails because id's are not unique, I am not sure I see a use case for searching by id.

sckott commented 8 years ago

@dlebauer did you mean to ping me for feedback/etc. ?

dlebauer commented 8 years ago

Not yet... I think it's b/c I mentioned you above but thanks for checking! On Thu, Feb 4, 2016 at 5:46 PM Scott Chamberlain notifications@github.com wrote:

@dlebauer https://github.com/dlebauer did you mean to ping me for feedback/etc. ?

— Reply to this email directly or view it on GitHub https://github.com/PecanProject/bety/issues/381#issuecomment-180107344.

gsrohde commented 8 years ago

@dlebauer The pecandev.igb.illinois.edu/beta/api/v0/search?id=111 search does work—it's only pecandev.igb.illinois.edu/beta/api/v0/search/111 that doesn't.

I didn't show examples of querying by any column value other than id, but in the current implementation, you can query by any column. For example, you can do

http://pecandev.igb.illinois.edu/beta/api/v0/search?commonname=switchgrass&result_type=traits&citation_year=1996&city=DuQuoin

What else can go in the query string besides column names? Currently, both limit=nnn and offset=nnn are supported.

I've started working on the POST/PUT endpoints, but I could go back and document the GET endpoints more comprehensively. What I've just written is pretty much all you need to know though, and I had intended to wait on anything more formal until we're more sure we have things the way we want them.

gsrohde commented 8 years ago

I have some insertion API code working and deployed at http://pecandev.igb.illinois.edu/beta. I will post some how-to's for loading sample data.

gsrohde commented 8 years ago

Here are some more sample queries that reflect recent changes and extensions to the v0 API. These include authentication, fuzzy-matching (which is really PostgreSQL RegExp matching), the option to limit the number of results returned, and the option for XML responses.

Note that I'm using the string =~ as the fuzzy-match operator even though in PostgreSQL, regular expression matches are done with just ~. This is mainly to make it easier to parse the query string. Also note that if you are running these examples in the browser and are logged into BETYdb, then you don't need to include the key parameter in the query string.

dlebauer commented 8 years ago

This looks great. One thing I notice is that the 'edit url' points to the same host as the api call. It should probably go to the instance that 'owns' the id's being edited (e.g. betydb.org where id<1 billion).

dlebauer commented 8 years ago

I don't think that the way the limit works is useful.

This is what I see:

I'd prefer to see:

dlebauer commented 8 years ago

I would suggest that using the =~ for fuzzy matching should be case insensitive.

Are there any reasons not to make all URL based queries case insensitive?

gsrohde commented 8 years ago

I've made these changes:

dlebauer commented 8 years ago

@gsrohde can this issue be closed? Are the features from the original post implemented, and do we need to make any new sub issues for additional GET and POST endpoints?

Has all of the new functionality been documented?

gsrohde commented 8 years ago

Remaining API work is in https://github.com/terraref/computing-pipeline/issues/124 so I'm closing this. The additional trait filters item is not implemented. If you still think it's important, add it to that issue. Or if you think the GET API piece really should be a PecanProject/bety issue, let me know and I'll make a new issue and move all the GET-related tasks in issue 124 there.