dewv / ops

Tracking of operational tasks in DevOps production environment
2 stars 0 forks source link

Cleanup NLC production database duplications #159

Closed smattingly closed 3 years ago

smattingly commented 3 years ago

1. What is the goal of these changes?

To cleanup duplicated records in the production NLC database. See #156

2. List the machine(s) and file(s)/setting(s) to be changed

The MongoDB Atlas cloud service; PRODUCTION "project"

3. Give exact commands or use marked screenshots to show GUI-based changes

// Duplicated students
// 1. List usernames (in all lower case) that are repeated with different capitalizations, and the most recent 
//     update timestamp among the repeats
db.student.aggregate([{$group: {_id: {$toLower: "$username"}, updated: {$max: "$updatedAt"}, count: {$sum: 1}}}, {$match: {count: {$gt: 1}}}])

// For each repeated username, we will keep the profile record that was most recently updated. 
// We want it to have the username in the correct form (all lower case).

// Usernames in results of query 1 are all lowercase. For each, ...
// 2. List all student profiles with variations of that lowercase username
db.student.find({username: {$regex: /LOWERCASE_USERNAME_FROM_ABOVE/i}}, {username: 1, updatedAt: 1})

// Does the most recently updated profile (see timestamp from query 1) have the username in all lower case?
// If yes, we can go directly to step 4.
// If no, we will fix that in step 3, which has two pieces. 
// 3a. Since duplicate usernames (same capitalization) are not allowed we must first modify the record that has 
//       the username in all lowercase. Just stick X on the front of it.
db.student.updateOne({ username: "LOWERCASE_USERNAME_FROM_ABOVE" }, { $set: { username: "XLOWERCASE_USERNAME_FROM_ABOVE" }})
// 3b. Then we can put the lowercase username on the most recently updated of the profiles:
db.student.updateOne({ username: "USERNAME AS IT APPEARS ON THE MOST RECENT UPDATED"}, { $set: {username: "LOWERCASE_USERNAME_FROM_ABOVE}})

// Now we have the lowercase username on the most recently updated of the student's profiles. 
// 4. For each of the "extra" student profiles from step 2 above: 
// 4a. Change all visits that are linked to this "extra" profile to use the one good student profile. 
db.visit.updateMany({ student: ObjectId("AN EXTRA OBJECT ID") }, {$set: {student: ObjectId("THE GOOD OBJECT ID")}})
// 4b. Delete this extra student profile.
db.student.deleteOne({ _id: ObjectId("AN EXTRA OBJECT ID")})
// Return to step 2 to process next student

// Duplicated staff
// The process is very similar to that for students. Object IDs link staff to student records as their SLP instructor.
// Probably few if any students will be linked to "extra" staff Object Id's.

NB: these changes should be coordinated with the release of a fix to https://github.com/dewv/nlcAttendance/issues/155; that fix will prevent these duplications from happening again.

4. What production services will be affected by the changes?

No services will be interrupted. Data will be changing, but should not impact operations, even if there were anyone in the system.

5. What files or other data should be backed up before making these changes?

The entire MongoDB production database. Also, for easy verification of changes, login to production app as staff and download all visit data before making changes.

6. Approximately how long will it take to make the changes?

1 to 2 hours?

7. List steps for testing that the goal is met, and nothing was broken

8. List steps for safely backing out of these changes in case of difficulties

Restore the backup from step 5, if changes were made and need to be abandoned.

9. Summarize the messages to be sent to customers and/or staff before, during (if needed), and after these changes

Before: notification that changes will be made "tonight" During: none for customer; staff will be informed via Operations Log After: summary of results

smattingly commented 3 years ago

Data cleanup completed this date.