EBSECan / donatemask

Donate A Mask Project Repository
GNU Lesser General Public License v2.1
10 stars 7 forks source link

Demographics collection backend changes with tests #133

Closed humphd closed 2 years ago

humphd commented 2 years ago

This is Part 1 of a 2 part fix for #126, adding the back-end data collection piece. I'll push the front-end piece next, but it relies on this code being in place, so I want to get this merged first.

This change adds a new demographics collection to the database. It is meant to keep user's personal information separate from the demographic info. Every time a request comes in that includes at least 1 request for Rapid Tests (i.e., we won't bother if it's only masks being requested), we'll include anonymous demographic data about affected, vulnerable groups.

The collection's data schema looks like this:

{
  _id: "61e86ee591676123aaab97d1",
  groups: ["People with low income", "Children/youth", ...],
  timestamp: "2022-01-19T20:04:52.611Z"
}

Any/all checkboxes for vulnerable groups will show up as an item in the groups field (i.e., Array). I've also included a .stats() method for counting and reporting on these groups (currently there is no UI for this, we need to figure out how to do it...maybe a cli tool you can run on the server?).

I've also written unit tests for these changes.

There are no new dependencies for this change (i.e., no need for npm install on the server).

mekkim commented 2 years ago

Great idea to keep it separate. How do you feel about including just postal code or province along with it, for regional overlay of demographics?

humphd commented 2 years ago

I had the same thought, but does it mean we'll be including private "fingerprintable" data for a request (e.g., you could correlate request data and demographic data)? I guess the timestamp does the same thing. Maybe it's not a big issue? Not sure.

Up to you, I can easily add it. Let me know.

mekkim commented 2 years ago

If we delete the name, address, email address of the request as part of the automated cleanup, then even if the two can be matched up, there's nothing to personally identify. To me, the only question is if postal code is too localized and if we should keep it at provincial higher level.

humphd commented 2 years ago

Based on https://www.priv.gc.ca/en/privacy-topics/privacy-laws-in-canada/02_05_d_15/, I don't think a Postal Code is considered Personally Identifiable Information (PII), so maybe we can include that. Do you read this the same way?

What is generally not considered personal information can include:

Information that is not about an individual, because the connection with a person is too weak or far-removed (for example, a postal code on its own which covers a wide area with many homes)

mekkim commented 2 years ago

My reading is that postal code is ok, but some people are uncomfortable with stuff that is too close to identifying, even if it isn't strictly personal. Like if it narrows down to 10 houses on 1 street, that's not /personally/ identifiable, but it's proximal. I'm fine with it, but I wanted to make sure everyone else is also fine with it.