Closed benkilimnik closed 1 year ago
Things noticed on review (will update as I go through all the schemas):
commento
domains
table)moderators
table)
email
column be made a foreign key otherwise?domain
the PK we want for the views
table?ghchat
group_info
have id
column, which should also be the PK?int(11)
are here either text
or int
; not sure if this should be standardizedprivate_msg
sgroups1
table has pk id
rather than group_id
in originalposts
table has pk id
rather than post_id
in originalfilter
column commented out in posts
table?favourites
table correct?follow_system
have a foreign key for follow_to
?group_members
messages
table might need: FOREIGN KEY (con_id) REFERENCES conversations(con_id)
recommendations
table) be owners?shares
table might need: FOREIGN KEY (post_id) REFERENCES posts(id)
hotcrp
PaperStorage
table:
paperStorageId
and there be a FK for paperId
that references the Paper
table?Capability
table correct?MailLog
table, sounds like paperIds
won't work (since plural) as an FK into Paper
id
column added to the PaperComment
table? Seems like commentId
should be the PKPaperReview
table, ON DEL in PaperReviewRefused
socify
activities
table, do we need an FK for recipient_id
?authentications
need an OWNED_BY
annotation (for user_id?)?badges_sashes
table might need FK for notified_user
columnfriendly_id_slugs
might be sluggable_id
insteadtarget_id
in merit_actions
table?related_change_id
in merit_activity_logs
table?photo_album_id
in photos
table?voter_id
in votes
table?mouthful
ReplyTo
column in Comment
table?Thanks @rpaul48
I've responded to your questions & feedback below.
commento
domains
table)
schema-annot/README.md
describes some of the changes made to the schemas due to unsupported features in k9db.moderators
table)
email
column be made a foreign key otherwise?
domain
the PK we want for the views
table?
id
column. A primary key was not provided in the original -- depending on the application, some of these tables probably autogenerate PK columns. ghchat
group_info
have id
column, which should also be the PK?
int(11)
are here either text
or int
; not sure if this should be standardized
private_msg
s
groups1
table has pk id
rather than group_id
in original
group
which causes k9db to crash. posts
table has pk id
rather than post_id
in original
filter
column commented out in posts
table?
favourites
table correct?
fav_by
is the person who initiated the action (owner), while user
is the person being favourited (they shouldn't get access or ownership). follow_system
have a foreign key for follow_to
?
group_members
messages
table might need: FOREIGN KEY (con_id) REFERENCES conversations(con_id)
recommendations
table) be owners?
recommend_to
should simply be a FK. I've changed it.shares
table might need: FOREIGN KEY (post_id) REFERENCES posts(id)
hotcrp
PaperStorage
table:
paperStorageId
and there be a FK for paperId
that references the Paper
table?
Paper
table already points to PaperStorage
via paperStorageId
. Would have to move the CREATE TABLE
for Paper
above PaperStorage
to have an FK pointing to Paper
from PaperStorage
originalStorageId
-- it would likely be a self-referencing foreign key, which is currently unsupported (issue #170)Capability
table correct?
salt
was provided in the originalMailLog
table, sounds like paperIds
won't work (since plural) as an FK into Paper
id
column added to the PaperComment
table? Seems like commentId
should be the PK
PaperReview
table, ON DEL in PaperReviewRefused
socify
activities
table, do we need an FK for recipient_id
?
authentications
need an OWNED_BY
annotation (for user_id?)?
badges_sashes
table might need FK for notified_user
column
friendly_id_slugs
might be sluggable_id
instead
target_id
in merit_actions
table?
related_change_id
in merit_activity_logs
table?
photo_album_id
in photos
table?
voter_id
in votes
table?
mouthful
ReplyTo
column in Comment
table?
I've taken a look at instagram
schema, might do some more if I have time: here are a couple of things that came up:
bkmrk_by
in bookmarks
might be a FK (possibly, to the users
table?);block_by
in blocks
might be a FK;notifications
, FOREIGN KEY (user) REFERENCES users(id)
seems to be incorrect, given that in the data dump user
is always 0;ACCESSED_BY
annotations (this is probably off-topic, since you have probably only considered data ownership):
follow_system
, follow_to
might need an ACCESSED_BY
annotation;shares
, share_to
might need an ACCESSED_BY
annotation;recommend
, recommend_to
might need an ACCESSED_BY
annotation;view
, view_to
might need an ACCESSED_BY
annotation;comments,
post_idmight need an
ACCESSED_BY`annotation;Thanks @artemagvanian! Some comments below:
bkmrk_by
in bookmarks
might be a FK (possibly, to the users
table?);
block_by
in blocks
might be a FK;
notifications
, FOREIGN KEY (user) REFERENCES users(id)
seems to be incorrect, given that in the data dump user
is always 0;
user=14
in (503, 24, 28, 0, 0, 'recommend', 14, '1525003631635', 'read'),
ACCESSED_BY
annotations (this is probably off-topic, since you have probably only considered data ownership):
Things look good to me. Thanks @rpaul48 and @artemagvanian for the follow up.
I think the way the follow_system
is annotated now is reasonable. I understand where Artem is coming from but:
follow_system
is created based solely on actions by the person initiating the follow. Generally, the right of access applies to data the application has about a user, i.e. data that was given to the application (or created in the application) by active actions of that user, or data the user did not actively create but was derived implicitly from their data (e.g. an application scraping my social media posts would have data about me, even though it is data that they acquired via scraping that I had no involvement in). I think it is a bit of stretch to say that the follow_system
record is about the person being followed in that sense.ACCESS_BY
rights is unreasonable. I am simply saying it is not required for compliance. To an extent, there is no right or wrong answer here, we just need a policy that is reasonable, and there could be indeed multiple reasonable policies. I lean more towards simplicity in this case.TLDR: lets keep follow_system
as it is right now.
I am going to take a deeper look. Thanks everyone for you help.
Apologies for taking a while to get to this, had two comments about ghchat and hotcrp:
ghchat
I think the addition of the group_info(id) column makes sense, but I think some of the foreign keys need to be changed to reflect the new column:
hotcrp
This is less specific to the annotation, but more of a general question for if/how K9DB handles a type of pattern. From what I understand from how HotCRP works, the PaperConflict table encodes different relationships between contactInfo and paperId based on the value of conflictType (e.g. a value for co-author, another value for institutional relationship, etc.). I think depending on the relationship, the data ownership/accessorship pattern might be different, like we would want to extract all papers that a contactInfo has co-authored, but not papers that they might have an institutional conflict with. I think it make more sense if we remove the OWNED_BY annotation on the column in the meanwhile (to prevent returning too many results)?
Thank you @arjeyaraj!
ghchat
: Great finds. I've made the changes you suggested. The FOREIGN KEY (to_group_id) ACCESSES group_info(id)
still fails unfortunately.
hotcrp
: Good point. I've removed the OWNED_BY
on contactId
in PaperConflict
and added an issue #172. Let's discuss this ownership pattern at the meeting.
Just took a look at the other schemas, here are more comments:
moderators
being both sharded to owners
and a data subject. Maybe it is worth taking another look at.user_user_relation
too, similarly to private_msgs
.votes
, voter_id
might be a FK to users
.photos
, photo_album_id
might be a FK to photo_albums
.EXPLAIN COMPLIANCE
call needs to be updated to match the updated annotations (might apply to other schemas as well).ON GET contact_id ANON
annotation.PaperReviewRefused
correctly, but having the ON GET requestedBy ANON ...
annotation feels a bit redundant. If someone has requested a review, shouldn't they be able to see the details of the reviewer?ReviewRequest
even though it has a DATA_SUBJECT
annotation. Could it be that some FKs in PaperReviewRefused
or PaperReview
should actually point to ReviewRequest
? It would be nice if someone more familiar with horcrp took another look.Thanks @artemagvanian and @KinanBab !
k9db
. I have used REFERENCES ONLY
to fix the sharding behavior. votes
, voter_id
might be a FK to users
.
photos
, photo_album_id
might be a FK to photo_albums
.
EXPLAIN COMPLIANCE
call needs to be updated to match the updated annotations (might apply to other schemas as well).
ON GET contact_id ANON
annotation.
PaperReviewRefused
correctly, but having the ON GET requestedBy ANON ...
annotation feels a bit redundant. If someone has requested a review, shouldn't they be able to see the details of the reviewer?
contactId
should own it, while the others requestedBy
and refusedBy
should be able to access it, so when accessors to PaperReviewRefused
request their data, PII fields like email
, firstName
, and lastName
should be anonymized.ReviewRequest
even though it has a DATA_SUBJECT
annotation. Could it be that some FKs in PaperReviewRefused
or PaperReview
should actually point to ReviewRequest
? It would be nice if someone more familiar with horcrp took another look.
ReviewRequest
. @KinanBab what do you think?For Hotcrp:
contactId
owning the PaperReviewRefused. Also note that contactId
is not null while refusedBy
is null.ON GET (requestedBy) ANON (...)
is reasonable. I agree that it feels a little redundant, but putting it in would not cause any harm w.r.t compliance. On the other hand, if we remove this, we will be taking on a risk in case there is some subtle policy issue with allowing the requester to see the review. I do not feel comfortable doing that myself, but perhaps I would have done it if we were hotcrp's original developers and we understood all the nuances of the application.ReviewRequest
being isolated like this is reasonable. The reason we made it a data subject is that the request may be directed at someone who does not have a hotcrp account (i.e. is not a contactId). Such a person still has rights. However, whenever this person takes any action on the request, the application forces them to create an account (e.g. they have to create an account in order to accept the request and provide a review, they also have to create an account to reject the review because contactId
in PaperReviewRefused is not null).Some general followups:
Both these things will be done and merged either Tuesday or Wednesday. My plan is to merge this immediately after.
@benkilimnik It would be great if you can do the following after the above is done and merged:
comments
in hotcrp (was there some other application that also had this behavior?)I will ping you on slack when the above is done and merged so that you can do the final touches. I can merge this immediately after.
Sorry @benkilimnik I meant comments
in commento not in hotcrp.
Adds data ownership annotations to the schemas of a number of sample applications and modifies them for compatibility with
k9db
. Unit tests may be added for them in the future.Progress:
Large schemas deprioritized, moved to branch
annotate-prestashop-opencart