Closed mitra42 closed 4 years ago
should old data be archived somewhere. I'm a bit concerned that there could be accidental deletion of data with no way to recover.
Can we have rolling backups that allow us to discard backup data when it's 21 days old?
Since each datum has a timestamp of when it came into the server, we can: 1) only have the server return data with timestamps within 21 days (or more probably configurable) 2) have the server delete data that is older than 42 days (or more probably configurable)
On Thu, May 7, 2020 at 11:55 AM jmday notifications@github.com wrote:
Can we have rolling backups that allow us to discard backup data when it's 21 days old?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Safe2COVIDApp/bct-server/issues/115#issuecomment-625435974, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAYRXCRG3H5WBQLM3HFZA3RQL7ZPANCNFSM4M24P4QA .
I'll let @mitra42 weight in as well, but I think it's important that we delete all data (including backed up data) within at most 45 days.
Any data that is not being used (or returned) should really not be saved in the live system.
agreed! In not saving the data in the live system (even if we don't return it) before we delete, what are we trying to solve for? Depending on the answer I would recommend different solutions. Note that if we save it "off" system, then we have to figure out where, how it gets there, etc... adds complexity.
I'm not necessarily advocating saving data for too long (I'm very aware of the privacy concerns), but I want us to as thoroughly as possible vet retention procedures.
On Thu, May 7, 2020 at 1:14 PM jmday notifications@github.com wrote:
I'll let @mitra42 https://github.com/mitra42 weight in as well, but I think it's important that we delete all data (including backed up data) within at most 45 days.
Any data that is not being used (or returned) should really not be saved in the live system.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Safe2COVIDApp/bct-server/issues/115#issuecomment-625473573, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAYRXAOEVYF7I3SNQDQV3DRQMJDVANCNFSM4M24P4QA .
I'd suggest not retaining the data past some point, mostly because the assertion of privacy is important to back up. I'd suggest these are two values in the config file, probably 21 days for live data (that we return in response to a query) and 45 days for full deletion (not backed up). Note the live data could actually be kept for a MUCH shorter time (as little as 2 days) since a) active clients poll for it regularly, so we are really only trying to allow a client to catch up after its been offline. b) new clients cant get anything useful since they don't have a location or id history to compare against the old data.
yup
On Thu, May 7, 2020 at 2:23 PM Mitra Ardron notifications@github.com wrote:
I'd suggest not retaining the data past some point, mostly because the assertion of privacy is important to back up. I'd suggest these are two values in the config file, probably 21 days for live data (that we return in response to a query) and 45 days for backup. Note the live data could actually be kept for a MUCH shorter time (as little as 2 days) since a) active clients poll for it regularly, so we are really only trying to allow a client to catch up after its been offline. b) new clients cant get anything useful since they don't have a location or id history to compare against the old data.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Safe2COVIDApp/bct-server/issues/115#issuecomment-625504956, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAYRXC5TVDU7B6OUCDLBULRQMRFRANCNFSM4M24P4QA .
Suggest at least 14 days for live data. This will ensure the servers have any data necessary to inform Safe Score calculations, even if someone has not had signal for 14 days (such as some of the remote communities we are seeking to serve).
OK - I'll take this
Twisted makes it easy to run scheduled tasks in the server. I think it makes most sense to do it there.
Ok, done and fixed the timing issues - PR submitted
I made some signficant code changes, which adds a serial number to the item (so you don't have to do the clock hack). I also moded the deletion code to move the actual file deletes to a thread. The tests pass, but I think you might want to take a look at the update code and make sure I got it right. FYI, file names are now of the format KEY:FLOATING_TIME:SERIAL_NUMBER.data
Ok - but can we not change the file name format any more ! Its not pulled out into separate functions and there is code in multiple places going from values to filenames and back to indexes making changes such as this likely to break stuff in other places.
Also - this version is failing tests - I can't figure out the code changes so I think it will have to be you to find the problem. (Note the tests were all working pre refactor)
all good to go now, yes i agree that file names formats should be centralized to one place, I don't see a problem with changing formats as long as the code supports the old formats. happy to discuss though.
I think this is complete now - unless there is refactoring to happen ?
we don't have the 14 day live window yet, let's keep this open until we do.
You mean that when a request for a set of locations comes in, then it should only return those after a certain time ?
If so, then that is worth its own Issue - and I can tackle it.
ok
Need to make sure to delete old data