maxachis commented 1 month ago

To ensure the mirroring functionality operates properly, and to make it easier to modify changes without worrying about something breaking, we should add integration tests.

Compared with the tests in https://github.com/Police-Data-Accessibility-Project/data-sources-app-v2, these would be limited to integration tests, since there isn't a lot of middleware and we would eventually be replacing this logic with calls to endpoints. The tests would need to be performed on a stage or sandbox version of the database.

Additionally, code within the repository would need to be refactored to be more easily testable.

maxachis commented 1 month ago

Implementation Notes

Necessary for conversion to Airtable Alternative

If we intend to replace airtable, as in https://github.com/Police-Data-Accessibility-Project/data-sources-app/issues/32, we need to make sure we have everything well-defined and know exactly how everything is working on the backend, and be able to replicate necessary functionality 1:1.

Logging may be useful.

Whenever a new entry is added to the database, we may want to log that is has, in fact, been added. Currently, as far as I can tell, the only way to know that a log has been added is to check the Digital Ocean database and see that new rows exist.

Add intermediate data objects

Currently, the mirror functionality downloads recently added data from airtable and creates dictionaries populated only with those fields which are not null in the corresponding airtable row, as opposed to all possible fields. While that is convenient from the standpoint of reducing the amount of unnecessary data sent in API requests, it makes things more difficult in the backend, because we don't have a clear idea of all the fields that can possibly be added -- we have to determine it by examining the corresponding entries in the database and airtable fields.

Additionally, this means that field names have to be a 1:1 match between their names in the airtable and their names in the database. Which also means that if a field name is changed on either end, or new field names are added, you might not know how they interface with the mirror code.

To solve this, we can create intermediate dataclasses which are explicit about the fields they can store. We can then populate them with the fields in airtable, and then translate those fields from the object into the database.

This adds more code, but it also makes the relationship more clear -- you don't have to look at an external source to determine what fields can be added. Additionally, it ensures that, if a new column is added on the airtable side or an existing column is modified on the database side, that is caught by the intermediate code and it can fail gracefully.

It will also make testing easier, because we can more easily decouple the airtable-side logic from the database-side logic.

maxachis commented 1 month ago

Implementation Todo

[ ] Create Data Transfer Objects (DTOs) that contain all the possible fields for each of the airtable tables.
[ ] Create constants for the database table names (as opposed to the functions which currently exist). Place in constants.py file.
[ ] Modify full_mirror_to_digital_ocean so that logic past get_full_table_data does not occur if there are no new rows.
[ ] Break up logic into separate files for airtable-side, database-side, and middleware.
[ ] Build integration tests that add data to database and confirm that expected keys are present.

maxachis commented 1 month ago

Tests

Tests Todo

Create tests for

[ ] Agencies
[ ] Data Sources (and link to Agencies)
[ ] Counties
[ ] Data Requests

Test Implementation Notes

In each test case, mock get_recent_airtable_data() (a new function I've created in this branch to isolate the logic which pulls new rows from airtable) to return exactly all the values you would get in a full form submission from airtable.
Similarly, mock update_do_table() to test that it gets exactly the results it is currently receiving when you run it for that test case.

These two points will be necessary in order to ensure that modifications of the code do not break anything when the middleware is being changed.

Police-Data-Accessibility-Project / data-sources-mirror

Refactor and add tests #59

Implementation Notes

Necessary for conversion to Airtable Alternative

Logging may be useful.

Add intermediate data objects

Implementation Todo

Tests

Tests Todo

Create tests for

Test Implementation Notes