Open maxachis opened 2 weeks ago
@josh-chamberlain, as before I'll need your input on the data sources permissions:
column_name | STANDARD | FORM | ADMIN | NOTE |
---|---|---|---|---|
name | READ | NONE | WRITE | |
submitted_name | READ | NONE | WRITE | |
description | READ | NONE | WRITE | |
record_type | READ | NONE | WRITE | |
source_url | READ | NONE | WRITE | |
agency_supplied | READ | NONE | WRITE | |
supplying_entity | READ | NONE | WRITE | |
agency_originated | READ | NONE | WRITE | |
agency_aggregation | READ | NONE | WRITE | |
coverage_start | READ | NONE | WRITE | |
coverage_end | READ | NONE | WRITE | |
source_last_updated | READ | NONE | READ | |
detail_level | READ | NONE | WRITE | |
number_of_records_available | NONE | NONE | NONE | Deprecate in v2 |
size | NONE | NONE | NONE | Deprecate in v2 |
access_type | READ | NONE | WRITE | |
record_download_option_provided | READ | NONE | WRITE | |
data_portal_type | READ | NONE | WRITE | |
record_format | READ | NONE | WRITE | |
update_method | READ | NONE | WRITE | |
tags | READ | NONE | WRITE | |
readme_url | READ | NONE | WRITE | |
originating_entity | READ | NONE | WRITE | |
retention_schedule | READ | NONE | WRITE | |
airtable_uid | READ | NONE | READ | |
scraper_url | READ | NONE | WRITE | |
data_source_created | READ | NONE | READ | |
airtable_source_last_modified | READ | NONE | READ | |
url_broken | NONE | NONE | NONE | Should already be deprecated |
submission_notes | NONE | NONE | READ | |
rejection_note | READ | NONE | WRITE | |
last_approval_editor | READ | NONE | READ | |
submitter_contact_info | NONE | NONE | READ | |
agency_described_submitted | READ | NONE | WRITE | |
agency_described_not_in_database | READ | NONE | WRITE | |
approved | NONE | NONE | NONE | Should already be deprecated |
record_type_other | READ | NONE | WRITE | |
data_portal_type_other | READ | NONE | WRITE | |
private_access_instructions | NONE | NONE | READ | |
records_not_online | NONE | NONE | NONE | Should already be deprecated |
data_source_request | READ | NONE | READ | |
url_button | NONE | NONE | NONE | Should not be synced from Airtable |
tags_other | READ | NONE | WRITE | |
broken_source_url_as_of | READ | NONE | WRITE | |
access_notes | READ | NONE | WRITE | |
url_status | READ | NONE | WRITE | |
approval_status | NONE | NONE | WRITE | |
record_type_id | READ | NONE | WRITE |
@josh-chamberlain Additionally, I note a quirk in our current /data-sources-by-id
endpoint which may need figuring out:
The GET
method for this returns not just columns from data_sources
, but columns from agencies
and data_sources_archive_info
as well. This adds a few interesting wrinkles:
/data-requests
and /agencies
, where the GET
function returns columns only from one table, and doesn't add columns from other tables. GET
and POST/EDIT
logic; some of the columns returned by GET
will not be columns that can be edited (even if only theoretically, due to access permissionsin
POST/PUT`My recommendation is to do away with the old method and maintain the one-table-per-endpoint standard I've set for /data-requests
and /agencies
; what do you think?
@maxachis
url_broken
, approved
, records_not_online
; these should not exist at all, and are relics of old concepts that should probably have been deprecated. Those are not in Airtable or the Data Dictionary.
url_broken
was replaced by url_status
approved
was replaced by approval_status
records_not_online
was replaced by...nothing, or maybe url statusrecord_type_id
mean we don't need record_type
?Yes, it does—I would argue that agency-level properties are core to describing Data Sources, as they contain baseline information about location and jurisdiction, but that is mostly for searching, and we have an endpoint for that. You also need them on Data Source Details, and we can hit the Agencies endpoint for the info if needed. I am okay with taking your recommendation and keeping things uniform for now (i.e. only returning columns from the table).
- permission table updated! Again, many of these can come through on form submissions, so they are not STANDARD but neither or they admin-only. Do we need that information somewhere?
@josh-chamberlain Yes! I added a "FORM" column to indicate this, although I wonder if we should consider them to have similar logic to Data Requests owners instead and have the column be "OWNER" -- if it's the latter, that may allow me to copy more of the existing logic for data requests.
I suppose it relates to a larger series of questions to be had over data sources
(and agencies
, to which this also applies):
Essentially, in addition to having logic for users who are data requestors, do we want logic for users who are data providers? Maybe not for v2, but perhaps for post-v2?
@josh-chamberlain @maxachis Some thoughts here:
As is, this logic violates the standard I've set for /data-requests and /agencies, where the GET function returns columns only from one table, and doesn't add columns from other tables.
Well, this may mean that the DB design needs rethinking. Can't we just associate the agency fields in the data source table with the associated agency? This would be the standard practice for dealing with such a problem, rather than having the client hit 2 endpoints and merge the responses in order to create 1 object.
This confuses the GET and POST/EDIT logic; some of the columns returned by GET will not be columns that can be edited (even if only theoretically, due to access permissionsinPOST/PUT`
I don't think this is a problem. I've worked with plenty of APIs that have different permission structures for different methods. We can bake these entitlements into the users table and encode them into the JWT.
While I do see value in connecting these tables together (I would imagine many a person would want to see the names of agencies associated with data requests), these CRUD endpoints might not be the place for them.
Exactly. Relating them in the DB is the most efficient and standard solution for this.
@josh-chamberlain After chatting with @joshuagraber , I think he's right and it's worth pulling from a view for GET
that includes agency-relevant info -- while clearly demarcating what are attributes from agencies and what are not. I'll think more about how that will work.
I'll think more about the implementation on this. I'll also make a note to include @joshuagraber earlier in these conversations, so there's less changing of horses mid-stream 🐎💦
@josh-chamberlain @joshuagraber
Incidentally, this means the structure for GET
responses will be a bit different. The outer wrapper will be the same, but the interior will change to include a nested agency
which contains the agency-derived properties. Just to clearly demarcate what is derived from agencies:
{
"id": 1
"agency_id": 2
"name": "cat"
"agency": {
"id": 2
"name": "Cats Department"
}
}
Note that agency_id
remains at the first level because it is technically an attribute of data sources.
thank you @maxachis and @joshuagraber for finding each other and resolving this harmoniously!
My intent is to work on this once @EvilDrPurple finishes a round of her #303 work. The question of how to implement nested queries with JSON while also taking variable columns (and to ensure that can be extended to other endpoint which may need this in the future as well) is a nontrivial problem. From what I've seen, SQLAlchemy may make that easier.
Context
In keeping with the changes in #417 and #390, the
/data-sources
endpoints need to be updated to enforce consistency with new standards and enhancements.Requirements
DELETE
endpoint to `data-source/data-sources/
GET` endpoint to include pagination and sortingRelated
430
419
Tests
Docs
Open questions
/data-sources-needs-identification
and/data-sources-map
? These can likely be consolidated with filter and column selection logic into the main endpoints (albeit at a performance cost (though perhaps not a substantial one))/data-sources-by-id/{data_source_id}
will become/data-sources/id/{data_source_id}
and/data-sources
will become/data-sources/page/{page_number}
. Is this acceptable?