GSA-TTS / FAC

GSA's Federal Audit Clearinghouse
Other
18 stars 5 forks source link

πŸ“ [Epic] Migration data loading: final push #4240

Open danswick opened 2 weeks ago

danswick commented 2 weeks ago

What problems would we like to solve?

1) Some historical data is still missing from our public dissemination data.
2) All migration records (information about how we interpreted data while migrating) have yet to be loaded into a production database.
3) SF-SAC data cannot be loaded using the same techniques we used for loading dissemination data. It's not clear how we can do so while maintaining a clean foreign key relationship between the user table and the other SF-SAC tables while the database is live and submissions are being added.

How do we know we’re done?

  1. Data from each of the data categories described below has been loaded into production and can be verified using verification methods to be determined and documented.
  2. Scripts and other data loading processes are documented in a single place and can be replicated later if needed.

Who will work on this epic?

@gsa-suk @rocheller123 @sambodeme @jadudm

Where are we now?

This table describes the status of each category of data-that-needs-loading.

Field Count Load method Status Date completed
singleauditchecklists (and friends) \~3,000 Utility βœ… In prod Late June β€˜24
dissemination_ \~3,000 Shell script βœ… In prod Late June β€˜24
singleauditchecklists (and friends) \~300 Utility Backed up to Drive and need to be loaded. These are leftovers from the \~3,000 batch.
dissemination_ \~300 Shell script Backed up to Drive and need to be loaded. These are leftovers from the \~3,000 batch. βœ… In Prod 09/09/24
dissemination_ \~277,000 βœ… In prod Late Jan β€˜24
singleauditchecklists (and friends) \~277,000 Shell script Backed up to S3 and need to be loaded.
migrationstatus \~280,000 Shell script Backed up to drive and need to be loaded. βœ… In Prod 09/09/24
Historic_ tables ? ? Backed up to S3 and need to be loaded. These are the Census tables. βœ… In prod 09/12/24
PDF-only 212 ? Needs decision on how to handle. These are leftover reports that are just PDFs with no SF-SAC data.

Links! Tickets, documents, repos, etc. Things we’ve used to track work in recent months:

What needs to happen next

This list divides the work up into three categories: housekeeping, SF-SAC strategy and loading, and loading everything else. The project team should break the work up into more specific/detailed tickets if needed.

### Next steps
- [ ] **Housekeeping**: determine a "source of truth" for the data in each of the above categories and document them.
- [ ] **Housekeeping**: review the linked :memo: documents above and either commit the code snippets to a codebase or mark them as deprecated.
- [ ] https://github.com/GSA-TTS/FAC/issues/4281
- [ ] **SF-SAC loading**:  depending on which approach is decided, capture the steps here. For example, if we need to take intake offline, we need to draft/review/deploy communications ahead of time.
- [x] Load `migrationstatus` data using the methods previously used to load the `3000` records. https://github.com/GSA-TTS/FAC/issues/4266
- [x] https://github.com/GSA-TTS/FAC/issues/4277 https://github.com/GSA-TTS/FAC/issues/4279
- [x] Load the `300` remaining dissemination records using the methods previously used to load the `3000` records. https://github.com/GSA-TTS/FAC/issues/4266

Notes previously: https://docs.google.com/document/d/1wC8PC3_VeAz09-msIL_uIRO9a3tzz1nOgqnbMdWoczE/edit

gsa-suk commented 1 week ago

09/09/24 - Load 318 dissemination data and migration status to Prod (Sudha, Hassan, Rochelle, Matt) https://github.com/GSA-TTS/FAC/issues/4266

gsa-suk commented 5 days ago

09/12/24 - Load census historical tables to Prod - (Sudha, Matt, Rochelle)

https://github.com/GSA-TTS/FAC/issues/4279