Police-Data-Accessibility-Project / data-sources-app

An API and UI for using and maintaining the Data Sources database
MIT License
3 stars 5 forks source link

Create/modify endpoint methods that create entries in database to give all columns that can be created. #365

Open maxachis opened 2 months ago

maxachis commented 2 months ago

For methods that create entries in the database (currently only the POST method of /data-sources), it may be useful to do the following:

  1. Break up endpoints more explicitly into sub-endpoints -- one which retrieves entries (i.e., the GET method of /data-sources, which would become /data-sources/all) and one which adds entries (i.e., the POST method of /data-sources, which would become /data-sources/add
  2. Add a GET method to those designed to add entries so that they retrieve all columns which can be created. For example, GET for /data-sources/add would retrieve all columns that can be added via the method, along with a description and their data type.
  3. For all columns which can be added, add comments to those columns within the database, which are returned along with the other information, and which provide the most-up-to-date description of what that column does.

The rationale behind this is the following:

The SQL Script involved can be, per ChatGPT:

SELECT 
    cols.table_schema,
    cols.table_name,
    cols.column_name,
    cols.data_type,
    pgd.description AS column_comment
FROM 
    information_schema.columns cols
LEFT JOIN 
    pg_catalog.pg_statio_all_tables AS st 
    ON cols.table_schema = st.schemaname AND cols.table_name = st.relname
LEFT JOIN 
    pg_catalog.pg_description pgd 
    ON pgd.objoid = st.relid AND pgd.objsubid = cols.ordinal_position
WHERE 
    cols.table_schema NOT IN ('information_schema', 'pg_catalog')
    AND cols.table_name = 'your_table_name';  -- Replace 'your_table_name' with the actual table name
maxachis commented 2 months ago

@josh-chamberlain This is one where I'm especially curious about your thinking on the matter. From my vantage point, it should help us to keep documentation up-to-date and dynamic, and ensure we only have a single point of truth for it.

josh-chamberlain commented 2 months ago

@maxachis I responded in Discord, but I don't know that we resolved it to your satisfaction.

I think best practice is for each CRUD function to be an HTTP request on the same endpoint, data-sources GET and data-sources POST. I don’t think /add or /all would be needed.

From what I understand, POST usually edits one, and GET often fetches many; maybe we should have the GET fetch one by ID (like for the data source details page) and use the search endpoint for returning bulk sources. We’re always going to want to attach params to a bulk search, so that endpoint might serve all our needs.

The priority is to make a fairly standard and nicely working API for our users/front end, and accept whatever documentation overhead Swagger leaves us with.

So, I think I disagree with your step 1, but would steps 2 and 3 still work?

maxachis commented 2 months ago

@josh-chamberlain We might need to come up with an alternative endpoint to retrieve detailed column information, if we don't want to use the /add or /all method. Maybe something like data_sources/fields or data_sources/data_dictionary, which could retrieve all the fields per steps 2 and 3. If we assume that all things which work within the data_sources namespace work with the same table and the same set of columns, that'd be doable. We'd just need to make sure that all the fields returned in any of the endpoints have the same names, and we don't have cases where field X is named field Y in one endpoint but field Z in another.

That way, we'd have the information conveyed from the data dictionary documentation in an endpoint, dependent directly on data in the relevant columns.

josh-chamberlain commented 2 months ago

The idea of using an endpoint to let people get columns would mean that only people who can use the API have access to our schema, right?

If we're using swagger/openAPI with flask-restx, don't we get request and response bodies for free when we use annotations in the code? The schema for each endpoint would be in those samples, which can be auto-generated from the model.

I'm sorry to be stubborn—I think your goals for automation and clarity and source-of-truthness make sense, but they seem achievable without any special endpoints provided we annotate properly.

From our GPT friend

Define Your API and Models:

Use Flask-RESTX to define your API endpoints and data models with annotations. This ensures that your Swagger documentation is generated based on your code.

  1. Setting Up Flask-RESTX:
from flask import Flask, request
from flask_restx import Api, Resource, fields

app = Flask(__name__)
api = Api(app, version='1.0', title='Sample API',
          description='A sample API with automatic documentation')

ns = api.namespace('data-sources', description='Data source operations')

data_source = api.model('DataSource', {
    'id': fields.Integer(readOnly=True, description='The unique identifier of a data source'),
    'name': fields.String(required=True, description='The name of the data source'),
    'description': fields.String(description='A brief description of the data source'),
    'created_at': fields.DateTime(description='The creation timestamp')
})

@ns.route('/')
class DataSourceList(Resource):
    @ns.doc('list_data_sources')
    @ns.marshal_list_with(data_source)
    def get(self):
        '''List all data sources'''
        # Logic to list data sources
        return []

    @ns.doc('create_data_source')
    @ns.expect(data_source)
    @ns.marshal_with(data_source, code=201)
    def post(self):
        '''Create a new data source'''
        # Logic to create a data source
        return {}, 201

@ns.route('/<int:id>')
@ns.response(404, 'Data source not found')
@ns.param('id', 'The data source identifier')
class DataSource(Resource):
    @ns.doc('get_data_source')
    @ns.marshal_with(data_source)
    def get(self, id):
        '''Fetch a data source given its identifier'''
        # Logic to fetch a specific data source
        return {}

    @ns.doc('delete_data_source')
    @ns.response(204, 'Data source deleted')
    def delete(self, id):
        '''Delete a data source given its identifier'''
        # Logic to delete a specific data source
        return '', 204

    @ns.expect(data_source)
    @ns.marshal_with(data_source)
    def put(self, id):
        '''Update a data source given its identifier'''
        # Logic to update a specific data source
        return {}

if __name__ == '__main__':
    app.run(debug=True)

Key Points:

When you run your Flask application, Flask-RESTX automatically generates the Swagger UI, which you can access (by default) at http://localhost:5000/. This UI will display all your documented endpoints, their expected inputs, outputs, and other details.

Keeping Documentation Up-to-Date:

maxachis commented 2 months ago

If we're using swagger/openAPI with flask-restx, don't we get request and response bodies for free when we use annotations in the code? The schema for each endpoint would be in those samples, which can be auto-generated from the model.

I'm sorry to be stubborn—I think your goals for automation and clarity and source-of-truthness make sense, but they seem achievable without any special endpoints provided we annotate properly.

Please don't apologize for stubbornness! You are the primary restraint on me going crazy and trying to convert the entire codebase to Rust or something like that.

I have been using models quite liberally in the documentation -- the only issue is that those models have to be declared and described explicitly. If the fields we are returning change, whether because the underlying database is modified or we change how we present the fields, we have to remember to update those models as well. And when the models contain upwards of 30 (or 60) possible fields, that can be clunky. I have not, as of yet, found a way to auto-develop the models from the response content alone (although that would be a nifty idea).

marshal_with can definitely help with more simple models, but it has some issues with more complex data models, such as dictionaries with other objects within it. There are other ways we can do marshalling to bypass those issues, but that may require additional third party components such as Marshmallow.

Regardless, I think it's a mild issue on my part that I don't have strong attachment to. If there's a lot of hesitation (for very good reasons, I hasten to add), I'm content to just drop it for now and see how we play out with documenting the models in the existing API documentation, rather than adding an additional endpoint.