The main goal here is to import old annotations (different schema than the import script was built to support) and some much newer ones that were never brought into the DB proper. I want them in the DB so we can export a nice SQLite archive that's easy for people to dig through, as opposed to collating data from a variety of sources.
To do:
[x] Basic support for all schemas
[x] Support a special file to map author names to e-mails or user IDs (I already made this file; just need to support it here).
[x] Use this to import annotations
[x] Manually swap out annotation author foreign key field in DB with user IDs from the annotation_author field in the annotations this posts, then remove that field from these annotations (should not have PII in the annotations, but I want to link things correctly and the DB doesn't give us a way to actually do that. Easier to run manual migration on the DB than to change the DB source to support this).
This project is technically deprecated, but I’m doing some work here to support final shutdown and archival of data (https://github.com/edgi-govdata-archiving/web-monitoring/issues/170).
The main goal here is to import old annotations (different schema than the import script was built to support) and some much newer ones that were never brought into the DB proper. I want them in the DB so we can export a nice SQLite archive that's easy for people to dig through, as opposed to collating data from a variety of sources.
To do:
annotation_author
field in the annotations this posts, then remove that field from these annotations (should not have PII in the annotations, but I want to link things correctly and the DB doesn't give us a way to actually do that. Easier to run manual migration on the DB than to change the DB source to support this).