brownsys / K9db

MySQL-compatible database for GDPR compliance by construction.
MIT License
30 stars 0 forks source link

Annotate sample application schemas #169

Closed benkilimnik closed 1 year ago

benkilimnik commented 1 year ago

Adds data ownership annotations to the schemas of a number of sample applications and modifies them for compatibility with k9db. Unit tests may be added for them in the future.

Progress:

Large schemas deprioritized, moved to branch annotate-prestashop-opencart

rpaul48 commented 1 year ago

Things noticed on review (will update as I go through all the schemas):

commento

ghchat

instagram

hotcrp

socify

mouthful

benkilimnik commented 1 year ago

Thanks @rpaul48

I've responded to your questions & feedback below.

commento

ghchat

instagram

hotcrp

socify

mouthful

artemagvanian commented 1 year ago

I've taken a look at instagram schema, might do some more if I have time: here are a couple of things that came up:

benkilimnik commented 1 year ago

Thanks @artemagvanian! Some comments below:

KinanBab commented 1 year ago

Things look good to me. Thanks @rpaul48 and @artemagvanian for the follow up.

I think the way the follow_system is annotated now is reasonable. I understand where Artem is coming from but:

  1. While related to the user being followed, that the row in the follow_system is created based solely on actions by the person initiating the follow. Generally, the right of access applies to data the application has about a user, i.e. data that was given to the application (or created in the application) by active actions of that user, or data the user did not actively create but was derived implicitly from their data (e.g. an application scraping my social media posts would have data about me, even though it is data that they acquired via scraping that I had no involvement in). I think it is a bit of stretch to say that the follow_system record is about the person being followed in that sense.
  2. With that said, I do not think that giving users ACCESS_BY rights is unreasonable. I am simply saying it is not required for compliance. To an extent, there is no right or wrong answer here, we just need a policy that is reasonable, and there could be indeed multiple reasonable policies. I lean more towards simplicity in this case.

TLDR: lets keep follow_system as it is right now.

I am going to take a deeper look. Thanks everyone for you help.

arjeyaraj commented 1 year ago

Apologies for taking a while to get to this, had two comments about ghchat and hotcrp:

ghchat

I think the addition of the group_info(id) column makes sense, but I think some of the foreign keys need to be changed to reflect the new column:

hotcrp

This is less specific to the annotation, but more of a general question for if/how K9DB handles a type of pattern. From what I understand from how HotCRP works, the PaperConflict table encodes different relationships between contactInfo and paperId based on the value of conflictType (e.g. a value for co-author, another value for institutional relationship, etc.). I think depending on the relationship, the data ownership/accessorship pattern might be different, like we would want to extract all papers that a contactInfo has co-authored, but not papers that they might have an institutional conflict with. I think it make more sense if we remove the OWNED_BY annotation on the column in the meanwhile (to prevent returning too many results)?

benkilimnik commented 1 year ago

Thank you @arjeyaraj!

ghchat: Great finds. I've made the changes you suggested. The FOREIGN KEY (to_group_id) ACCESSES group_info(id) still fails unfortunately.

hotcrp: Good point. I've removed the OWNED_BY on contactId in PaperConflict and added an issue #172. Let's discuss this ownership pattern at the meeting.

artemagvanian commented 1 year ago

Just took a look at the other schemas, here are more comments:

commento

ghchat

socify

hotcrp

benkilimnik commented 1 year ago

Thanks @artemagvanian and @KinanBab !

commento

ghchat

socify

hotcrp

KinanBab commented 1 year ago

For Hotcrp:

KinanBab commented 1 year ago

Some general followups:

  1. @artemagvanian discovered a small visual bug in EXPLAIN COMPLIANCE that makes the output look a bit wrong (it ends up confusing source tables for the next tables in a transitive chain).
  2. We are making progress independently on supporting self-directed ACCESSED_BY and ACCESSES.

Both these things will be done and merged either Tuesday or Wednesday. My plan is to merge this immediately after.

@benkilimnik It would be great if you can do the following after the above is done and merged:

  1. Put back the self accessed_by annotation for comments in hotcrp (was there some other application that also had this behavior?)
  2. Reproduce the EXPLAIN COMPLIANCE output in the README after making all the pending changes to the schemas and after merging in the visual bug fix.

I will ping you on slack when the above is done and merged so that you can do the final touches. I can merge this immediately after.

KinanBab commented 1 year ago

Sorry @benkilimnik I meant comments in commento not in hotcrp.