Closed maxachis closed 3 days ago
@josh-chamberlain, as before I'll need your input on the data sources permissions:
column_name | STANDARD | FORM | ADMIN | NOTE |
---|---|---|---|---|
name | READ | NONE | WRITE | |
submitted_name | READ | NONE | WRITE | |
description | READ | NONE | WRITE | |
record_type | READ | NONE | WRITE | |
source_url | READ | NONE | WRITE | |
agency_supplied | READ | NONE | WRITE | |
supplying_entity | READ | NONE | WRITE | |
agency_originated | READ | NONE | WRITE | |
agency_aggregation | READ | NONE | WRITE | |
coverage_start | READ | NONE | WRITE | |
coverage_end | READ | NONE | WRITE | |
source_last_updated | READ | NONE | READ | |
detail_level | READ | NONE | WRITE | |
number_of_records_available | NONE | NONE | NONE | Deprecate in v2 |
size | NONE | NONE | NONE | Deprecate in v2 |
access_type | READ | NONE | WRITE | |
record_download_option_provided | READ | NONE | WRITE | |
data_portal_type | READ | NONE | WRITE | |
record_format | READ | NONE | WRITE | |
update_method | READ | NONE | WRITE | |
tags | READ | NONE | WRITE | |
readme_url | READ | NONE | WRITE | |
originating_entity | READ | NONE | WRITE | |
retention_schedule | READ | NONE | WRITE | |
airtable_uid | READ | NONE | READ | |
scraper_url | READ | NONE | WRITE | |
data_source_created | READ | NONE | READ | |
airtable_source_last_modified | READ | NONE | READ | |
url_broken | NONE | NONE | NONE | Should already be deprecated |
submission_notes | NONE | NONE | READ | |
rejection_note | READ | NONE | WRITE | |
last_approval_editor | READ | NONE | READ | |
submitter_contact_info | NONE | NONE | READ | |
agency_described_submitted | READ | NONE | WRITE | |
agency_described_not_in_database | READ | NONE | WRITE | |
approved | NONE | NONE | NONE | Should already be deprecated |
record_type_other | READ | NONE | WRITE | |
data_portal_type_other | READ | NONE | WRITE | |
private_access_instructions | NONE | NONE | READ | |
records_not_online | NONE | NONE | NONE | Should already be deprecated |
data_source_request | READ | NONE | READ | |
url_button | NONE | NONE | NONE | Should not be synced from Airtable |
tags_other | READ | NONE | WRITE | |
broken_source_url_as_of | READ | NONE | WRITE | |
access_notes | READ | NONE | WRITE | |
url_status | READ | NONE | WRITE | |
approval_status | NONE | NONE | WRITE | |
record_type_id | READ | NONE | WRITE |
@josh-chamberlain Additionally, I note a quirk in our current /data-sources-by-id
endpoint which may need figuring out:
The GET
method for this returns not just columns from data_sources
, but columns from agencies
and data_sources_archive_info
as well. This adds a few interesting wrinkles:
/data-requests
and /agencies
, where the GET
function returns columns only from one table, and doesn't add columns from other tables. GET
and POST/EDIT
logic; some of the columns returned by GET
will not be columns that can be edited (even if only theoretically, due to access permissionsin
POST/PUT`My recommendation is to do away with the old method and maintain the one-table-per-endpoint standard I've set for /data-requests
and /agencies
; what do you think?
@maxachis
url_broken
, approved
, records_not_online
; these should not exist at all, and are relics of old concepts that should probably have been deprecated. Those are not in Airtable or the Data Dictionary.
url_broken
was replaced by url_status
approved
was replaced by approval_status
records_not_online
was replaced by...nothing, or maybe url statusrecord_type_id
mean we don't need record_type
?Yes, it does—I would argue that agency-level properties are core to describing Data Sources, as they contain baseline information about location and jurisdiction, but that is mostly for searching, and we have an endpoint for that. You also need them on Data Source Details, and we can hit the Agencies endpoint for the info if needed. I am okay with taking your recommendation and keeping things uniform for now (i.e. only returning columns from the table).
- permission table updated! Again, many of these can come through on form submissions, so they are not STANDARD but neither or they admin-only. Do we need that information somewhere?
@josh-chamberlain Yes! I added a "FORM" column to indicate this, although I wonder if we should consider them to have similar logic to Data Requests owners instead and have the column be "OWNER" -- if it's the latter, that may allow me to copy more of the existing logic for data requests.
I suppose it relates to a larger series of questions to be had over data sources
(and agencies
, to which this also applies):
Essentially, in addition to having logic for users who are data requestors, do we want logic for users who are data providers? Maybe not for v2, but perhaps for post-v2?
@josh-chamberlain @maxachis Some thoughts here:
As is, this logic violates the standard I've set for /data-requests and /agencies, where the GET function returns columns only from one table, and doesn't add columns from other tables.
Well, this may mean that the DB design needs rethinking. Can't we just associate the agency fields in the data source table with the associated agency? This would be the standard practice for dealing with such a problem, rather than having the client hit 2 endpoints and merge the responses in order to create 1 object.
This confuses the GET and POST/EDIT logic; some of the columns returned by GET will not be columns that can be edited (even if only theoretically, due to access permissionsinPOST/PUT`
I don't think this is a problem. I've worked with plenty of APIs that have different permission structures for different methods. We can bake these entitlements into the users table and encode them into the JWT.
While I do see value in connecting these tables together (I would imagine many a person would want to see the names of agencies associated with data requests), these CRUD endpoints might not be the place for them.
Exactly. Relating them in the DB is the most efficient and standard solution for this.
@josh-chamberlain After chatting with @joshuagraber , I think he's right and it's worth pulling from a view for GET
that includes agency-relevant info -- while clearly demarcating what are attributes from agencies and what are not. I'll think more about how that will work.
I'll think more about the implementation on this. I'll also make a note to include @joshuagraber earlier in these conversations, so there's less changing of horses mid-stream 🐎💦
@josh-chamberlain @joshuagraber
Incidentally, this means the structure for GET
responses will be a bit different. The outer wrapper will be the same, but the interior will change to include a nested agency
which contains the agency-derived properties. Just to clearly demarcate what is derived from agencies:
{
"id": 1
"agency_id": 2
"name": "cat"
"agency": {
"id": 2
"name": "Cats Department"
}
}
Note that agency_id
remains at the first level because it is technically an attribute of data sources.
thank you @maxachis and @joshuagraber for finding each other and resolving this harmoniously!
My intent is to work on this once @EvilDrPurple finishes a round of her #303 work. The question of how to implement nested queries with JSON while also taking variable columns (and to ensure that can be extended to other endpoint which may need this in the future as well) is a nontrivial problem. From what I've seen, SQLAlchemy may make that easier.
@maxachis
{ "id": 1 "agency_id": 2 "name": "cat" "agency": { "id": 2 "name": "Cats Department" } }
The only potential issue with this exact implementation is there are instances (4 to be exact) where a data source has multiple agencies attached to it. You can view them by querying the database like this:
SELECT * FROM
(
SELECT link_id, airtable_uid, agency_described_linked_uid,
COUNT(1)OVER(partition by airtable_uid) AS cnt
FROM agency_source_link
)a
WHERE cnt > 1
@EvilDrPurple @josh-chamberlain
Oooooh, that's an interesting dynamic. One possible solution is to instead have it be agencies
, which returns the results as a list, like:
{
"id": 1
"agency_ids": [2, 3]
"name": "cat"
"agencies": [
{
"id": 2
"name": "Cats Department"
},
{
"id": 3
"name": "Felines Forensics"
},
]
}
At the same time, maybe we don't want that if there are only 4 instances in the database where this occurs. What do you think, @josh-chamberlain?
At the same time, maybe we don't want that if there are only 4 instances in the database where this occurs.
It is likely we will add more in the future, as we find more data sources that span multiple departments, such as multi-agency investigations or regional data that lumps in many agencies
I can confirm that this will happen more in the future. In fact, the number would be bigger if we stopped using the aggregated
agencies. Discussion here: https://github.com/Police-Data-Accessibility-Project/data-sources-app/issues/272
I think @maxachis's suggestion to return an array of associated agencies makes sense in this case. For web app purposes, we can just loop through it and render UI for each of them. And I think for anyone else using the API, the structure of the data is clear that the data is associated with / aggregated from multiple agencies.
GET MANY
requests.@maxachis FYI, I'm starting on the data source view, which will require agencies returned from this endpoint. Do you think you have space to prioritize this soon, or should I start with some placeholder data for now?
Also, it looks like @vikramrojo is planning (for the future) a link from this view to the data source map, centered on the source, so I'll eventually need the geolocation fields here as well. Not urgent, but while we're here I thought I'd mention it.
In addition to the above format, we'll likely also want to include the "count" of the number of agencies returned, much as we do in typical GET MANY requests.
Yes, I think so.
@maxachis FYI, I'm starting on the data source view, which will require agencies returned from this endpoint. Do you think you have space to prioritize this soon, or should I start with some placeholder data for now?
Also, it looks like @vikramrojo is planning (for the future) a link from this view to the data source map, centered on the source, so I'll eventually need the geolocation fields here as well. Not urgent, but while we're here I thought I'd mention it.
Oh it's already underway: I've my main fish @EvilDrPurple working on developing it. But it may be germane to use some placeholder data for at least a little bit while we get the final kinks worked out.
Sounds good @maxachis thanks for the update!
@joshuagraber @maxachis
@maxachis FYI, I'm starting on the data source view, which will require agencies returned from this endpoint. Do you think you have space to prioritize this soon, or should I start with some placeholder data for now?
Optimistically I should have it submitted by tomorrow, so long as there are no unforeseen problems I run into
@joshuagraber With @EvilDrPurple's changes merged in, I can start working on this! I'm going to work into incorporating the changes described in #430 at the same time, since I'm in the area and otherwise I'll be repeating my efforts not long after this, so that will extend the time it takes to complete this, but hopefully only by a day or two.
Edit: I already took care of this part, so I'll be working on implementing the enhancements as requested.
Context
In keeping with the changes in #417 and #390, the
/data-sources
endpoints need to be updated to enforce consistency with new standards and enhancements.Requirements
DELETE
endpoint to `data-source/data-sources/
GET` endpoint to include pagination and sortingRelated
430
419
Tests
Docs
Open questions
/data-sources-needs-identification
and/data-sources-map
? These can likely be consolidated with filter and column selection logic into the main endpoints (albeit at a performance cost (though perhaps not a substantial one))/data-sources-by-id/{data_source_id}
will become/data-sources/id/{data_source_id}
and/data-sources
will become/data-sources/page/{page_number}
. Is this acceptable?