Police-Data-Accessibility-Project / data-sources-mirror

A home for data initially entered into public Airtable forms
1 stars 1 forks source link

Refactor and add tests #59

Open maxachis opened 1 month ago

maxachis commented 1 month ago

To ensure the mirroring functionality operates properly, and to make it easier to modify changes without worrying about something breaking, we should add integration tests.

Compared with the tests in https://github.com/Police-Data-Accessibility-Project/data-sources-app-v2, these would be limited to integration tests, since there isn't a lot of middleware and we would eventually be replacing this logic with calls to endpoints. The tests would need to be performed on a stage or sandbox version of the database.

Additionally, code within the repository would need to be refactored to be more easily testable.

maxachis commented 1 month ago

Implementation Notes

Necessary for conversion to Airtable Alternative

If we intend to replace airtable, as in https://github.com/Police-Data-Accessibility-Project/data-sources-app/issues/32, we need to make sure we have everything well-defined and know exactly how everything is working on the backend, and be able to replicate necessary functionality 1:1.

Logging may be useful.

Whenever a new entry is added to the database, we may want to log that is has, in fact, been added. Currently, as far as I can tell, the only way to know that a log has been added is to check the Digital Ocean database and see that new rows exist.

Add intermediate data objects

Currently, the mirror functionality downloads recently added data from airtable and creates dictionaries populated only with those fields which are not null in the corresponding airtable row, as opposed to all possible fields. While that is convenient from the standpoint of reducing the amount of unnecessary data sent in API requests, it makes things more difficult in the backend, because we don't have a clear idea of all the fields that can possibly be added -- we have to determine it by examining the corresponding entries in the database and airtable fields.

Additionally, this means that field names have to be a 1:1 match between their names in the airtable and their names in the database. Which also means that if a field name is changed on either end, or new field names are added, you might not know how they interface with the mirror code.

To solve this, we can create intermediate dataclasses which are explicit about the fields they can store. We can then populate them with the fields in airtable, and then translate those fields from the object into the database.

This adds more code, but it also makes the relationship more clear -- you don't have to look at an external source to determine what fields can be added. Additionally, it ensures that, if a new column is added on the airtable side or an existing column is modified on the database side, that is caught by the intermediate code and it can fail gracefully.

It will also make testing easier, because we can more easily decouple the airtable-side logic from the database-side logic.

maxachis commented 1 month ago

Implementation Todo

maxachis commented 1 month ago

Tests

Tests Todo

Create tests for

Test Implementation Notes

These two points will be necessary in order to ensure that modifications of the code do not break anything when the middleware is being changed.