Open Imran31 opened 8 years ago
@Imran31 Thanks for sharing your thoughts. Here are a few initial responses:
So, for example, endpoint GET /lang/latin/corpus/perseus/author/tacitus/text/germania becomes GET /lang/latin/corpus/1/author/6/text/8.
The API is intentionally explicit. I prefer this because it the URL is instantly recognizable. "Author 6, text 8" means nothing, but "Tacitus, Germania" is universally recognizable.
There is something to be said for keeping URLs short, but we are very, very far from what I would consider long.
a user should be able to browse and discover all the endpoints of the REST API using the REST API itself.
We have this already, though I think it could be made more intuitive. For example:
I'm open to hearing other ways of doing this.
About your POS addition to API, I'll need to look into this further. I will probably want to see an API which accounts for all "core" processing, not just individual parts.
There is something to be said for keeping URLs short, but we are very, very far from what I would consider long.
Yeah IDs will not be more helpful then, I thought that the growing URL length is a problem. I too think the existing URLs are much easier to recognise.
About your POS addition to API, I'll need to look into this further. I will probably want to see an API which accounts for all "core" processing, not just individual parts.
Sure! Does it make sense to list out all the /core/*
endpoints and how they will respond to different HTTP methods?
I'll start with a list of endpoints and their associated classes:
/core/jvreplacer
: JVReplacer/core/stem
: Stemmer/core/lemmatize
: LemmaReplacer/core/syllabify
: Syllabifier/core/ner
: nercore/tokenize
: PunktLanguageVars, TokenizeSentence, word_tokenizecore/distance
: TextReuse, LevenshteinI think this (https://github.com/cltk/cltk_api/issues/28#issuecomment-198932765) looks good for the first iteration of the project and can revise them in the future as it makes sense for the more complex tasks.
@lukehollis If you're comfortable with this API, then let's go for it. Just so long as everyone knows that the specifics will be subject to a revision sometime.
Thanks to all on this.
Thanks @lukehollis, I have extended this discussion into my proposal (I just shared it with the organisation via the GSoC website). I look forward to your comments there.
This issue is to brainstorm the design of the API endpoints and responses. I'll start with a couple of points on shorter URLs, HATEOAS and the folder structure.
Shorter URLs
I propose maintaining numeric IDs for each author, corpus, text, etc. and using those to construct the REST endpoints.
So, for example, endpoint
GET /lang/latin/corpus/perseus/author/tacitus/text/germania
becomesGET /lang/latin/corpus/1/author/6/text/8
.This keeps the URLs short while allowing the actual names that the IDs map to to be as long as needed.
A problem with this (assuming an external API consumer) is figuring out the ID of a specific author/corpus/text.
API Discoverability
The formal term for this is HATEOAS. This implies a user should be able to browse and discover all the endpoints of the REST API using the REST API itself.
Towards this, we should define endpoints like
GET /lang/latin/corpus/
that returns a response:This way, the user will be able to query for all the available corpora and figure out the ID.
Another example of this is from my POS tagger implementation. It is possible to view the list of languages and POS tagging methods they support via
GET /core/pos
, and perform the actual POS tagging for a string viaPOST /core/pos
.In general, adding a
GET
request handler to endpoints like/lang
,/lang/<int:lang_id>/corpus
, etc. should make the API discoverable.Folder Structure
Right now all the resources are defined in a single file (
api_json.py
), and so are tests (tests.py
). There is also no distinction between files containing utility functions and actual REST resources.I briefly mentioned this in my https://github.com/cltk/cltk_api/issues/20#issuecomment-198688724.
An example of my proposed organisation is in https://github.com/cltk/cltk_api/pull/27/. Inside the folder for a specific function (
/pos
), the resources will be inviews.py
, the database stuff (if any) inmodels.py
, utility functions inutils.py
and parameters inconstants.py
.(It may be better to keep
constants.py
at the root of the API folder structure, to easily find and change)