Police-Data-Accessibility-Project / data-sources-app

An API and UI for using and maintaining the Data Sources database
MIT License
2 stars 5 forks source link

Set up Swagger API Docs #356

Closed maxachis closed 3 weeks ago

maxachis commented 1 month ago

Flask_restx (which our app uses) is swagger-compatible, which automatically documents the API based on method definitions. It's pretty nifty, but we currently don't have the code properly set up to do it, or a means to migrate it into a document that can be viewed outside of the source code itself. This is an issue to work on that.

Tasks

josh-chamberlain commented 1 month ago

Currently, when I run the app locally, going to the base url shows me these docs. Could we deploy it that way, too?

maxachis commented 1 month ago

Implementation Todo

maxachis commented 1 month ago

Implementation Notes

Adding Readme to Resources directory.

These add a considerable amount of code to the resources files. I consider this acceptable, because this information would otherwise be included in the Gitbook, but it is worth noting. I have included a set of implementation notes in a README located within the resources directory, to aid in deciphering some components. Once understood, it is fairly intuitive, but "once understood" is the crucial part.

Catches outdated code

One thing immediately noticed in comparing the Gitbook with the documentation I'm adding is that the Gitbook is, in some areas, out of date. The data-sources-needs-identification endpoint, for example, is not mentioned in the Gitbook, and the data-sources GET endpoint mentions an optional parameter for getting data sources that need identification, which is no longer implemented within that endpoint. By sheer virtue of the Swagger documentation being tightly coupled with the code, and restx models enforcing the response content of endpoints, it will be much harder for documentation to be out of date.

Data Sources Documentation Needs Work

However, my current code did not complete the data sources documentation. Specifically, the inputs and outputs which define the data sources and various data source attributes have been left deliberately limited in their descriptions. This is for a few reasons:

  1. As stated in #359, some of the data sources columns may be extraneous and not needed in some endpoints.
  2. The data source columns returned by each is variable and not entirely consistent; some columns are returned by data-sources-map but not data-sources-by-id; others are returned by data-sources-by-id but not data-sources. The logic as to why some are included and others excluded in particular methods is not entirely clear.
  3. The number of columns returned is very substantial. This makes it a pain to write them all out, and additionally compounds on the problems in 1 and 2. Errors become more likely and also harder to detect just because there are so many columns.
  4. The lack of a single source of truth makes me hesitant about putting anything definite down. It is entirely possible that some of these endpoints, in their documentation, mention columns that no longer exist. To address that necessitates its own issue.

Thus, I think we need two issues created:

Marshalling Response

Marshalling Responses essentially means making sure that the content of responses conform to a given specification before returning them. This ensures the responses are properly formatted and standardized.

In the case of flask_restx, it can additionally serve to ensure that documentation conforms to specifications as they exist in the code -- a model created and defined in the documentation can additionally be used with the marshal_with decorator to ensure the response returned conforms to the expected value. If backend code changes what is being returned, marshal_with would throw an error.

However, the marshal_with decorator, despite my best attempts, currently only works with relatively simple models. If we are returning, for example, a dictionary with a key of "count" and an integer, and then another key of "data" which is a list of json objects, it doesn't work well.

An alternative, therefore, would be to use sometihng such as apispec, which could define more complex data types, and which would integrate with marshmallow.

~Server API Bug~

~There appears to be a common bug when loading the api page server-side, even if it builds fine locally. I'm still looking into how to handle it, but here are some possible resources:~

~Of these, the first option seems to provide the most viable potential solution.~

UPDATE: This has been successfully resolved.

"Try It Out"

The "Try It Out" option enables a user to test out API responses in the Swagger documentation itself. I've seen this work successfully in some cases, but it appears to struggle in cases where there is a large amount of data to return, as in the agencies /get response. This may speak to a larger issue where at times we may want the option to limit more substantially the number of results returned by some of our larger GET methods -- especially as the number of rows increases, we may want to be mindful of how many we allow a user to return at once.

Additionally, the "Try It Out" option is also revealing some other interesting issues -- for example, the /api_key endpoint can't be tried out in Swagger because Swagger doesn't support GET requests with content in the body. However, GET requests with content in the body are also generally not recommended by official HTTP specifications.

maxachis commented 1 month ago

Bug listed in "Server API Bug" has been addressed, and the (Draft) Swagger documentation on the API is now available at https://data-sources-v2.pdap.dev/api

josh-chamberlain commented 1 month ago

gitbook docs deprecation complete: https://app.gitbook.com/o/-MXypK5ySzExtEzQU6se/s/-MXyolqTg_voOhFyAcr-/~/changes/510/api/v2-api-docs-wip