REST API Design - Githubissues

Imran31 commented 8 years ago

This issue is to brainstorm the design of the API endpoints and responses. I'll start with a couple of points on shorter URLs, HATEOAS and the folder structure.

Shorter URLs

I propose maintaining numeric IDs for each author, corpus, text, etc. and using those to construct the REST endpoints.

So, for example, endpoint GET /lang/latin/corpus/perseus/author/tacitus/text/germania becomes GET /lang/latin/corpus/1/author/6/text/8.

This keeps the URLs short while allowing the actual names that the IDs map to to be as long as needed.

A problem with this (assuming an external API consumer) is figuring out the ID of a specific author/corpus/text.

API Discoverability

The formal term for this is HATEOAS. This implies a user should be able to browse and discover all the endpoints of the REST API using the REST API itself.

Towards this, we should define endpoints like GET /lang/latin/corpus/ that returns a response:

{"corpora": [ {"name": "perseus", "id": "1"}, ... ]}

This way, the user will be able to query for all the available corpora and figure out the ID.

Another example of this is from my POS tagger implementation. It is possible to view the list of languages and POS tagging methods they support via GET /core/pos, and perform the actual POS tagging for a string via POST /core/pos.

In general, adding a GET request handler to endpoints like /lang, /lang/<int:lang_id>/corpus, etc. should make the API discoverable.

Folder Structure

Right now all the resources are defined in a single file (api_json.py), and so are tests (tests.py). There is also no distinction between files containing utility functions and actual REST resources.

I briefly mentioned this in my https://github.com/cltk/cltk_api/issues/20#issuecomment-198688724.

An example of my proposed organisation is in https://github.com/cltk/cltk_api/pull/27/. Inside the folder for a specific function (/pos), the resources will be in views.py, the database stuff (if any) in models.py, utility functions in utils.py and parameters in constants.py.

(It may be better to keep constants.py at the root of the API folder structure, to easily find and change)

kylepjohnson commented 8 years ago

@Imran31 Thanks for sharing your thoughts. Here are a few initial responses:

So, for example, endpoint GET /lang/latin/corpus/perseus/author/tacitus/text/germania becomes GET /lang/latin/corpus/1/author/6/text/8.

The API is intentionally explicit. I prefer this because it the URL is instantly recognizable. "Author 6, text 8" means nothing, but "Tacitus, Germania" is universally recognizable.

There is something to be said for keeping URLs short, but we are very, very far from what I would consider long.

a user should be able to browse and discover all the endpoints of the REST API using the REST API itself.

We have this already, though I think it could be made more intuitive. For example:

I'm open to hearing other ways of doing this.

About your POS addition to API, I'll need to look into this further. I will probably want to see an API which accounts for all "core" processing, not just individual parts.

Imran31 commented 8 years ago

There is something to be said for keeping URLs short, but we are very, very far from what I would consider long.

Yeah IDs will not be more helpful then, I thought that the growing URL length is a problem. I too think the existing URLs are much easier to recognise.

About your POS addition to API, I'll need to look into this further. I will probably want to see an API which accounts for all "core" processing, not just individual parts.

Sure! Does it make sense to list out all the /core/* endpoints and how they will respond to different HTTP methods?

I'll start with a list of endpoints and their associated classes:

/core/jvreplacer: JVReplacer
/core/stem: Stemmer
/core/lemmatize: LemmaReplacer
/core/syllabify: Syllabifier
/core/ner: ner
core/tokenize: PunktLanguageVars, TokenizeSentence, word_tokenize
core/distance: TextReuse, Levenshtein

lukehollis commented 8 years ago

I think this (https://github.com/cltk/cltk_api/issues/28#issuecomment-198932765) looks good for the first iteration of the project and can revise them in the future as it makes sense for the more complex tasks.

kylepjohnson commented 8 years ago

@lukehollis If you're comfortable with this API, then let's go for it. Just so long as everyone knows that the specifics will be subject to a revision sometime.

Thanks to all on this.

Imran31 commented 8 years ago

Thanks @lukehollis, I have extended this discussion into my proposal (I just shared it with the organisation via the GSoC website). I look forward to your comments there.

cltk / cltk_api

REST API Design #28

Shorter URLs

API Discoverability

Folder Structure