Queens-Hacks / qcumber-api

transforms qcumber data repo into something blah
1 stars 3 forks source link

Endpoints #2

Open uniphil opened 10 years ago

uniphil commented 10 years ago

A place to discuss the REST endpoitns of the qcumber-api. Building off the discussion from #1.

uniphil commented 10 years ago

Continuing from #1....

I want [uri] consistency between front and backend. --Mystor in #1

I'm not sure I get this. The API endpoints are basically database queries, and serve the front end + anything that wants to use course data. The front-end serves webpages and ajax to web browsers. I think the concerns are independent. I think the API endpoints should focus on representing the data, and should think bigger than the catalogue website.


Simpler /courses/?filter= endpoint for listing courses

Something else I've seen is a structure like a lot of /courses/by-subject/ANAT endpoints. Not a huge fan of that, seems like a lot more work, and more assumptions.

Also kind of at-issue here is what the rest of the path looks like for individual course resources. Is a course id something like /courses/ANAT-100, or do we follow the old catalogue and do /courses/ANAT/100, which, if we're making good hack-able endpoints, implies that you can do /courses/ANAT/ and get a list of ANAT courses.... but don't lists of courses come from just /courses/?

I suppose it comes back to whether we're striving for front/back URI consistency. I think /catalogue/ANAT/100 and /catalogue/ANAT/ are both awesome user-friendly URIs for the front-end, but I don't like it for the API.

pR0Ps commented 10 years ago

I agree the what the frontend is going to do or how it's URLs are going to be structured should have no impact on the API. Sure, we're using the API to display course listings in a nice manner, but the API can be used for other things, including things that won't use the catalogue/ANAT/100 URLs. APIs have to be built for the general case.

For an API I think the URLs should be descriptive as possible. For example /subject/ANAT is very clear. For courses /course/ANAT-100 is the equivalent. The data structure already uses SBJT-num as the ID of a course object so it maps nicely. This also works for instructors (/instructor/*) and any other data we want to put in the database.

This works when talking about getting a single, known object, but what about the case where you want to discover multiple objects (like all courses in ANAT)? What courses are in a subject is information that the subject should have so I would expect that the list of courses in a subject would just be a property of the subject, for example /subject/ANAT/courses.

This structure also allows for greater granularity. For example, what if you only wanted to see the instructors for a course and didn't care about the time slots or anything else? /course/{id}/instructors. This same format could be used for everything else as well (/course/{id}/prereqs). I'm not sure how far down we want to go on this though (/course/{id}/section/{id}/room/?). Specifying /all could be how to get a data dump from the API on that object (/course/{id}/all). We would then have to define what data is returned by default, and what extra data the /all adds.

For just getting a list of all objects of a type (like all subjects), another top level "folder" (list) could be used. These queries should return a minimal amount of information (like ID and name only). The justification for this is that the IDs returned can just be plugged into another query (/subject/{id}) if information other than the name is needed.

This API structure also leaves room for searching with /search/* endpoints, either generally (/search/general, what Qcumber has now), or filtering by a specific type (/search/instructors). We would have to think about what a course search actually means (search the description? just the code? what about the subject? Why not just use the course/{id} endpoint if you know the subject and code?), but the structure is there.

For write requests (PUT/PATCH/DELETE), it makes sense to only allow them on an actual object, not on searches or other operations. This is basically limited to /subject/{id}, /course/{id}, and other similar queries. Related to the "how far down" question from before, do we really want to have the URL specify exactly what should be changed (/course/{id}/section/{id}/room)? Or should the URL only be the base (/course/{id}) and something in the payload specifies what contents get touched? Personally I'm leaning towards the full URL method, just because I can't imagine a non-hacky way of structuring the payload to accomplish the same thing.

Disclaimer: I've done some work on The Movie DB API (at http://docs.themoviedb.apiary.io/) so most of these suggestions come from it's structure.

Examples:

Endpoint Data
/subject/{id} Basic subject data
/course/{id} Basic course data
/instructor/{id} Basic instructor data
/subject/{id}/courses All courses in the subject
/course/{id}/textbooks All textbooks for the course
/course/{id}/instructors All instructors for the course
/subject/{id}/all All data on that subject
/course/{id}/all All data on that course
/list/subject List of all subjects, ID/name only
/list/course List of all courses, ID/name only
/list/instructor List of all instructors, ID/name only
/search/general Free-form search
/search/subject Search for subjects
/search/course Search for courses
/search/instructor Search for instructors
mystor commented 10 years ago

I think something along these lines could work quite well. I think that we should have some discussion about what exactly will be visible at each endpoint, and I think that could lead to changes in this system, but it seems to make sense.

Graham42 commented 10 years ago

Just for consideration, this site was one my work used to reference as "best practices". https://blog.apigee.com/detail/restful_api_design_can_your_api_give_developers_just_the_information

Also as general intro to REST, this is a good webcast http://apigee.com/about/api-best-practices/restful-api-design-second-edition

pR0Ps commented 10 years ago

Summing up the discussion we had in person:

Read operations

Self links and IDs: We need to provide a link to the url that was access to return the data. This lets clients know where to post back to, among other things. The ID is needed so that clients can take the ID and use it to get more information about the object. The self link and the ID of the object will always be returned, with the only exception being for errors (a whole other thing).

Pluralization: We need to stick to either plural or not. Since getting back a single object from a plural endpoint makes more sense than getting back multiple objects from a singular endpoint, we decided that plural was best.

The /all endpoints are out. We replace this with a field specification system. For example, if you want the name, description, and prerequisites for a course, the query would be /courses/{id}?fields=name,desciption,prerequisites. This is more in line with what other RESTful APIs are doing and makes sure that we only return back data that the client asked for. In an effort to include all the data the client will need without them having to make a ton of other API calls, we talked about supporting a level of expension. The syntax for this would look like /subject/{id}?fields=courses(name, description). This query would return a list of course objects from the specified subject with description and name fields in them.

Bare endpoints: Bare endpoints are the endpoints that don't have a specific ID attached to them. Because of the lack of ID, they return all the objects of that type. Because this is typically a lot of data, we wouldn't allow expansion on bare endpoints, only regular fields. Fields that are objects will return the ID of the object to be used in future lookups.

Sections: Based on trawling through SOLUS, we came to the conclusion that sections are unique objects that don't have to be tied to a single course. Sections can also be combined with other sections. For this reason, we should include an endpoint specific to them (/sections), in the style of the other endpoints.

Batch queries: We have to include the case where you want some sort of data on 20 objects, but only have the IDs. Instead of doing 20 requests, we should include some way of getting a certain piece of data out of a specified subset. This has been added on to the "bare endpoint" by allowing the client to pass a list of ids into the query. Batch queries should be limited to a certain number of objects per lookup (100?).

Pagination: The API needs to be able to paginate data. For pagination, we decided to use the limit/offset approach, where the limit defines how many objects to return, while the offset is the object number to start at. Since all IDs will be strings, sorting alphabetically will keep the order constant. Limiting will only take effect on queries to bare endpoints and searches since specific objects will only return a single object.

Specific objects

Endpoint Return data
/subjects/{id} Mostly useless
/subjects/{id}?fields=courses(name) List of objects containing names of courses in the subject
/subjects/{id}?fields=courses List of course IDs in the subject
/courses/{id}?fields=name,sections Name of the course and a list of section IDs
/courses/{id}?fields=instructors All instructors for the course
/courses/{id}?fields=textbooks All textbooks for the course
/sections/{id}?fields=type, \ classes(day,start_time,end_time,room,instructor) Type of the section, as well as information on the classes in it
/instructors/{id}?fields=name Name of the instructor

Bare endpoints

Endpoint Return data
/subjects?fields=name,courses List names, and a list of courses in that subject for all subjects
/subjects?fields=name,courses&limit=25&offset=25 Second page of the above query
/subjects?ids=[id1,id2,id3]&fields=name The names of the specified IDs
/subjects?ids=[id1,id2,id3]&fields=name,courses List of names and a list of courses in that subject for the specified IDs
/sections?fields=name List of section names

Write operations

No write operations on queries with expanded fields (what would they do?).

On bare endpoints (/{type}):

On individual endpoints (/{type}/{id}):

Everything else would be an error

Examples

Endpoint Type Data Result
/subjects DELETE None Delete all subjects
/subjects POST [subject object] Create a new subject
/subjects/{id} PUT "{'name': 'new name'}" Replace the name of the subject with 'new name'
/subjects/{id} PUT "{'courses': [c1, c2, c3, c4]}" Replace the courses with a new set
/subjects/{id} PUT "{'name': 'new name', \ 'courses': [c1, c2, c3, c4]}" Aggregation of previous 2 commands
/subjects/{id} DELETE None Delete the subject