Safe2COVIDApp / bct-server

Bluetooth Contact Tracing for Covid19 - server
5 stars 1 forks source link

Delete server data after 21 days #115

Closed mitra42 closed 4 years ago

mitra42 commented 4 years ago

Need to make sure to delete old data

danaronson commented 4 years ago

should old data be archived somewhere. I'm a bit concerned that there could be accidental deletion of data with no way to recover.

jmday commented 4 years ago

Can we have rolling backups that allow us to discard backup data when it's 21 days old?

danaronson commented 4 years ago

Since each datum has a timestamp of when it came into the server, we can: 1) only have the server return data with timestamps within 21 days (or more probably configurable) 2) have the server delete data that is older than 42 days (or more probably configurable)

On Thu, May 7, 2020 at 11:55 AM jmday notifications@github.com wrote:

Can we have rolling backups that allow us to discard backup data when it's 21 days old?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Safe2COVIDApp/bct-server/issues/115#issuecomment-625435974, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAYRXCRG3H5WBQLM3HFZA3RQL7ZPANCNFSM4M24P4QA .

jmday commented 4 years ago

I'll let @mitra42 weight in as well, but I think it's important that we delete all data (including backed up data) within at most 45 days.

Any data that is not being used (or returned) should really not be saved in the live system.

danaronson commented 4 years ago

agreed! In not saving the data in the live system (even if we don't return it) before we delete, what are we trying to solve for? Depending on the answer I would recommend different solutions. Note that if we save it "off" system, then we have to figure out where, how it gets there, etc... adds complexity.

I'm not necessarily advocating saving data for too long (I'm very aware of the privacy concerns), but I want us to as thoroughly as possible vet retention procedures.

On Thu, May 7, 2020 at 1:14 PM jmday notifications@github.com wrote:

I'll let @mitra42 https://github.com/mitra42 weight in as well, but I think it's important that we delete all data (including backed up data) within at most 45 days.

Any data that is not being used (or returned) should really not be saved in the live system.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Safe2COVIDApp/bct-server/issues/115#issuecomment-625473573, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAYRXAOEVYF7I3SNQDQV3DRQMJDVANCNFSM4M24P4QA .

mitra42 commented 4 years ago

I'd suggest not retaining the data past some point, mostly because the assertion of privacy is important to back up. I'd suggest these are two values in the config file, probably 21 days for live data (that we return in response to a query) and 45 days for full deletion (not backed up). Note the live data could actually be kept for a MUCH shorter time (as little as 2 days) since a) active clients poll for it regularly, so we are really only trying to allow a client to catch up after its been offline. b) new clients cant get anything useful since they don't have a location or id history to compare against the old data.

danaronson commented 4 years ago

yup

On Thu, May 7, 2020 at 2:23 PM Mitra Ardron notifications@github.com wrote:

I'd suggest not retaining the data past some point, mostly because the assertion of privacy is important to back up. I'd suggest these are two values in the config file, probably 21 days for live data (that we return in response to a query) and 45 days for backup. Note the live data could actually be kept for a MUCH shorter time (as little as 2 days) since a) active clients poll for it regularly, so we are really only trying to allow a client to catch up after its been offline. b) new clients cant get anything useful since they don't have a location or id history to compare against the old data.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Safe2COVIDApp/bct-server/issues/115#issuecomment-625504956, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAYRXC5TVDU7B6OUCDLBULRQMRFRANCNFSM4M24P4QA .

jmday commented 4 years ago

Suggest at least 14 days for live data. This will ensure the servers have any data necessary to inform Safe Score calculations, even if someone has not had signal for 14 days (such as some of the remote communities we are seeking to serve).

mitra42 commented 4 years ago

OK - I'll take this

mitra42 commented 4 years ago
danaronson commented 4 years ago

Twisted makes it easy to run scheduled tasks in the server. I think it makes most sense to do it there.

mitra42 commented 4 years ago

Ok, done and fixed the timing issues - PR submitted

danaronson commented 4 years ago

I made some signficant code changes, which adds a serial number to the item (so you don't have to do the clock hack). I also moded the deletion code to move the actual file deletes to a thread. The tests pass, but I think you might want to take a look at the update code and make sure I got it right. FYI, file names are now of the format KEY:FLOATING_TIME:SERIAL_NUMBER.data

mitra42 commented 4 years ago

Ok - but can we not change the file name format any more ! Its not pulled out into separate functions and there is code in multiple places going from values to filenames and back to indexes making changes such as this likely to break stuff in other places.

mitra42 commented 4 years ago

Also - this version is failing tests - I can't figure out the code changes so I think it will have to be you to find the problem. (Note the tests were all working pre refactor)

danaronson commented 4 years ago

all good to go now, yes i agree that file names formats should be centralized to one place, I don't see a problem with changing formats as long as the code supports the old formats. happy to discuss though.

mitra42 commented 4 years ago

I think this is complete now - unless there is refactoring to happen ?

danaronson commented 4 years ago

we don't have the 14 day live window yet, let's keep this open until we do.

mitra42 commented 4 years ago

You mean that when a request for a set of locations comes in, then it should only return those after a certain time ?

mitra42 commented 4 years ago

If so, then that is worth its own Issue - and I can tackle it.

danaronson commented 4 years ago

ok