Police-Data-Accessibility-Project / data-sources-app

An API and UI for using and maintaining the Data Sources database
MIT License
2 stars 5 forks source link

Enhance `data-sources` CRUD endpoints #429

Open maxachis opened 2 weeks ago

maxachis commented 2 weeks ago

Context

In keeping with the changes in #417 and #390, the /data-sources endpoints need to be updated to enforce consistency with new standards and enhancements.

Requirements

Related

Tests

Docs

Open questions

maxachis commented 2 weeks ago

@josh-chamberlain, as before I'll need your input on the data sources permissions:

column_name STANDARD FORM ADMIN NOTE
name READ NONE WRITE
submitted_name READ NONE WRITE
description READ NONE WRITE
record_type READ NONE WRITE
source_url READ NONE WRITE
agency_supplied READ NONE WRITE
supplying_entity READ NONE WRITE
agency_originated READ NONE WRITE
agency_aggregation READ NONE WRITE
coverage_start READ NONE WRITE
coverage_end READ NONE WRITE
source_last_updated READ NONE READ
detail_level READ NONE WRITE
number_of_records_available NONE NONE NONE Deprecate in v2
size NONE NONE NONE Deprecate in v2
access_type READ NONE WRITE
record_download_option_provided READ NONE WRITE
data_portal_type READ NONE WRITE
record_format READ NONE WRITE
update_method READ NONE WRITE
tags READ NONE WRITE
readme_url READ NONE WRITE
originating_entity READ NONE WRITE
retention_schedule READ NONE WRITE
airtable_uid READ NONE READ
scraper_url READ NONE WRITE
data_source_created READ NONE READ
airtable_source_last_modified READ NONE READ
url_broken NONE NONE NONE Should already be deprecated
submission_notes NONE NONE READ
rejection_note READ NONE WRITE
last_approval_editor READ NONE READ
submitter_contact_info NONE NONE READ
agency_described_submitted READ NONE WRITE
agency_described_not_in_database READ NONE WRITE
approved NONE NONE NONE Should already be deprecated
record_type_other READ NONE WRITE
data_portal_type_other READ NONE WRITE
private_access_instructions NONE NONE READ
records_not_online NONE NONE NONE Should already be deprecated
data_source_request READ NONE READ
url_button NONE NONE NONE Should not be synced from Airtable
tags_other READ NONE WRITE
broken_source_url_as_of READ NONE WRITE
access_notes READ NONE WRITE
url_status READ NONE WRITE
approval_status NONE NONE WRITE
record_type_id READ NONE WRITE
maxachis commented 2 weeks ago

@josh-chamberlain Additionally, I note a quirk in our current /data-sources-by-id endpoint which may need figuring out:

The GET method for this returns not just columns from data_sources, but columns from agencies and data_sources_archive_info as well. This adds a few interesting wrinkles:

  1. As is, this logic violates the standard I've set for /data-requests and /agencies, where the GET function returns columns only from one table, and doesn't add columns from other tables.
  2. This confuses the GET and POST/EDIT logic; some of the columns returned by GET will not be columns that can be edited (even if only theoretically, due to access permissionsinPOST/PUT`
  3. While I do see value in connecting these tables together (I would imagine many a person would want to see the names of agencies associated with data requests), these CRUD endpoints might not be the place for them.

My recommendation is to do away with the old method and maintain the one-table-per-endpoint standard I've set for /data-requests and /agencies; what do you think?

josh-chamberlain commented 2 weeks ago

@maxachis

Yes, it does—I would argue that agency-level properties are core to describing Data Sources, as they contain baseline information about location and jurisdiction, but that is mostly for searching, and we have an endpoint for that. You also need them on Data Source Details, and we can hit the Agencies endpoint for the info if needed. I am okay with taking your recommendation and keeping things uniform for now (i.e. only returning columns from the table).

maxachis commented 2 weeks ago
  • permission table updated! Again, many of these can come through on form submissions, so they are not STANDARD but neither or they admin-only. Do we need that information somewhere?

@josh-chamberlain Yes! I added a "FORM" column to indicate this, although I wonder if we should consider them to have similar logic to Data Requests owners instead and have the column be "OWNER" -- if it's the latter, that may allow me to copy more of the existing logic for data requests.

I suppose it relates to a larger series of questions to be had over data sources (and agencies, to which this also applies):

Essentially, in addition to having logic for users who are data requestors, do we want logic for users who are data providers? Maybe not for v2, but perhaps for post-v2?

joshuagraber commented 6 days ago

@josh-chamberlain @maxachis Some thoughts here:

As is, this logic violates the standard I've set for /data-requests and /agencies, where the GET function returns columns only from one table, and doesn't add columns from other tables.

Well, this may mean that the DB design needs rethinking. Can't we just associate the agency fields in the data source table with the associated agency? This would be the standard practice for dealing with such a problem, rather than having the client hit 2 endpoints and merge the responses in order to create 1 object.

This confuses the GET and POST/EDIT logic; some of the columns returned by GET will not be columns that can be edited (even if only theoretically, due to access permissionsinPOST/PUT`

I don't think this is a problem. I've worked with plenty of APIs that have different permission structures for different methods. We can bake these entitlements into the users table and encode them into the JWT.

While I do see value in connecting these tables together (I would imagine many a person would want to see the names of agencies associated with data requests), these CRUD endpoints might not be the place for them.

Exactly. Relating them in the DB is the most efficient and standard solution for this.

maxachis commented 6 days ago

@josh-chamberlain After chatting with @joshuagraber , I think he's right and it's worth pulling from a view for GET that includes agency-relevant info -- while clearly demarcating what are attributes from agencies and what are not. I'll think more about how that will work.

I'll think more about the implementation on this. I'll also make a note to include @joshuagraber earlier in these conversations, so there's less changing of horses mid-stream 🐎💦

maxachis commented 6 days ago

@josh-chamberlain @joshuagraber

Incidentally, this means the structure for GET responses will be a bit different. The outer wrapper will be the same, but the interior will change to include a nested agency which contains the agency-derived properties. Just to clearly demarcate what is derived from agencies:

{
    "id": 1
    "agency_id": 2
    "name": "cat"
    "agency": {
        "id": 2
        "name": "Cats Department"
    }
}

Note that agency_id remains at the first level because it is technically an attribute of data sources.

josh-chamberlain commented 6 days ago

thank you @maxachis and @joshuagraber for finding each other and resolving this harmoniously!

maxachis commented 6 days ago

My intent is to work on this once @EvilDrPurple finishes a round of her #303 work. The question of how to implement nested queries with JSON while also taking variable columns (and to ensure that can be extended to other endpoint which may need this in the future as well) is a nontrivial problem. From what I've seen, SQLAlchemy may make that easier.