ITI / searcch

SEARCCH Hub Frontend
https://searcch.cyberexperimentation.org/
BSD 3-Clause "New" or "Revised" License
3 stars 6 forks source link

feat: allow user to update email #181

Closed ckouder closed 1 year ago

ckouder commented 1 year ago

allow user to update email (we don't check email duplicate because user_id is used for each user) #155

carboxylman commented 1 year ago

@ckouder so this would allow them to use their current login session to change email addr, but we don't validate the change, e.g. by forcing a successful login via a different IdP? Seems like it could be easy to lock yourself out.

ckouder commented 1 year ago

How to lock oneself out? I think the backend uses user-id instead of email for a user's identity. The backend source code actually allows a user to update email ... https://github.com/ITI/searcch-backend/blob/development/searcch_backend/api/resources/user.py#L208

carboxylman commented 1 year ago

No, backend uses email addr to validate and match on: https://github.com/ITI/searcch-backend/blob/1c233d6b2baedaa5517922bc75382b7ece44e72b/searcch_backend/api/resources/login.py#L180 . So if they make a mistake in their change, or use an address their IDP didn't send us, they will lose access to their searcch account.

ckouder commented 1 year ago

No, backend uses email addr to validate and match on: https://github.com/ITI/searcch-backend/blob/1c233d6b2baedaa5517922bc75382b7ece44e72b/searcch_backend/api/resources/login.py#L180 . So if they make a mistake in their change, or use an address their IDP didn't send us, they will lose access to their searcch account.

Okay thank you!

ckouder commented 1 year ago

@carboxylman you are right. I successfully lock out myself on my local machine.

This can be done by storing the user identifiers (uid) of different IdP in backend User model. When the IdP returns user_info, instead of checking email, we check the uid to make sure that the user won't be locked out.

Fortunately all IdP return uid for OAuth requests, the id field in Github OAuth, localId field in Google, and sub + iss fields in Cilogon can help us associate a login request with an existing user account in database.

carboxylman commented 1 year ago

But the feature provided by SEARCCH using email as the matching token instead of per-IdP unique ID is that if user forgets which IdP they used originally, as long as another IdP verifies them on the same email address, it doesn't matter which IdP they use to sign in --- they still get to their SEARCCH account.

I don't know which feature is more appealing, but we can't have both without a hack or two.

ckouder commented 1 year ago

@carboxylman I don't think binding two IdPs to one searcch email address is a good idea. The user still logins to a separate account if he has different email addresses for IdPs. In addition, if he finds he logins to a different account by mistake, he can always logout and try a different IdP.

I think what we really want is to have a user first logged in using whatever IdP or email, then allow them to bind other IdP identifier to an immutable searcch account identifier. So that when a user changes his email, he won't be having any problem.

carboxylman commented 1 year ago

@carboxylman I don't think binding two IdPs to one searcch email address is a good idea. The user still logins to a separate account if he has different email addresses for IdPs. In addition, if he finds he logins to a different account by mistake, he can always logout and try a different IdP.

I am not authoritative here, but one name for the model where multiple IdP ids can map to one SP id (what we currently have) is sometimes called auto federation, and it does seem to be a typical model right now.

I think what we really want is to have a user first logged in using whatever IdP or email, then allow them to bind other IdP identifier to an immutable searcch account identifier. So that when a user changes his email, he won't be having any problem.

Another way to have this as well as the model above is to store the email addr and auto-federate on that; but also store the as secondary match tokens the IdP's unique ID for the user, and accept secondaries even if primary doesn't match; and offer to update the primary email addr if user wants. That is my current thinking...

ckouder commented 1 year ago

@carboxylman Hello David, when I was reading auto federation, I found it requires the attribute to be unique. However, the email field in Person table is neither unique nor non-nullable! The fact discourages me to take the auto-federation approach on email because of the lack of validation. I would recommend to add IdP ids in the user table instead.

My plan is to add two extra tables. One stores all IdP ids, with a unique foreign key that links to the user table. The other stores all email addresses the user had use, also with a foreign key that links to the user table. When a user logs in, we check the ids, if non exist, then check all his used email. If a match is found in either case, we log in the user.

Please let me know any suggestion you have! I want to make sure to not lose anything before I implement it...

carboxylman commented 1 year ago

@carboxylman Hello David, when I was reading auto federation, I found it requires the attribute to be unique. However, the email field in Person table is neither unique nor non-nullable! The fact discourages me to take the auto-federation approach on email because of the lack of validation. I would recommend to add IdP ids in the user table instead.

I'm not sure I understand what you mean. The User table is the one you want to consider here. When a user logs into our system for the first time, we create and link a Person with the (verified) email addr the IdP gives us. Any time we see a login associated with that email address, no matter which IdP is used to sign in, the backend maps that address to the first (and only) User.person_id with that email address. This logic ensures that we map an "incoming" IdP email address to one and only one User-Person tuple in our database. I don't see any problem for auto federation here.

My plan is to add two extra tables. One stores all IdP ids, with a unique foreign key that links to the user table. The other stores all email addresses the user had use, also with a foreign key that links to the user table. When a user logs in, we check the ids, if non exist, then check all his used email. If a match is found in either case, we log in the user.

Yes, storing the UIDs from the IdPs in one table, and linking each to the User table, is what I suggested in my previous comment; so I agree on that part. The other part I don't know that we really need; why do we need to store multiple email addresses for each user? David and I discussed simply telling the user that we would update their account email to whatever was sent by the IdP from their latest login. Example: if they had initially logged in with github (email addr foo), and subsequently logged in via google (email addr foo, auto federation takes place); then some time later they login with github and we get a secondary match on IdP UID but not on email addr, then we tell the user we are updating their email addr in their searcch profile to match. tl;dr: I don't see a requirement to support multiple email addresses yet.

ckouder commented 1 year ago

@carboxylman Hey David, thank you for your reply! Now I see that there is no problem for auto-federation.

For the second part: are we going to allow users to update their email in their profile? Like, the user has foo email in both searcch and GitHub. But then he decides to change the searcch email to bar thus breaks the match with GitHub email. In this case, we might need to keep foo as a backup so that the user can log in (unless we decide to use IdP uid to identifier a user).

carboxylman commented 1 year ago

For the second part: are we going to allow users to update their email in their profile? Like, the user has foo email in both searcch and GitHub. But then he decides to change the searcch email to bar thus breaks the match with GitHub email. In this case, we might need to keep foo as a backup so that the user can log in (unless we decide to use IdP uid to identifier a user).

We only trust that the IdP has verifed whatever email it gives us each time the user logs in, so my thinking is that we just update their searcch account if the IdP gives us a new one. We no longer know that the old email address is valid/verified, so we have to move to the current one.

This does assume we have a new table that maps IdP UIDs to searcch User table, and that we can do a secondary match on on IdP UID, if email address doesn't match a current User in the searcch database. Given this, we shouldn't need to keep a backup email addr; the "current" email addr suffices.

Going back to my example, let's say that that new email addr they had when they later logged in with github was bar. Then, if they still had foo as their google email addr, and if they kept switching between searcch logins with google and github, we would happily keep updating their address to whatever the IdP sent. :laughing: That is ok with me.

ckouder commented 1 year ago

@carboxylman I think there is some misunderstanding here ... isn't the purpose of this merge request is to solve this issue #155 ? but I don't see how the solution realizes the purpose... please let me know if I'm wrong!

carboxylman commented 1 year ago

@carboxylman I think there is some misunderstanding here ... isn't the purpose of this merge request is to solve this issue #155 ? but I don't see how the solution realizes the purpose... please let me know if I'm wrong

We need a verified email address and we need to handle the case where the email addr they registered with the IdP changes. But we cannot just let them input any address as their primary since we don't verify it. Two ways to go beyond: 1) if IdP returns a list of addrs we can add them all as "secondaries" (PersonMetadata) and the. Let them choose which is displayed; or 2) we can let them manually input a secondary. We cannot allow login or match with an unverified manually input addr though.

Thoughts? I have mostly been concerned with handling the case where their IdP addr changes so they are not locked out of their searcch account. I think that is the real point underlying the issue.

ckouder commented 1 year ago

I think both solutions are different from what @lauratinnel expects in #155. Perhaps Laura, my apology for not fully understanding the problem, could you help me understand the issue and give some hints about the realization?

Here are the problem @carboxylman and I have so far. @carboxylman, please feel free to point out my faults when attempting to reconstruct the discussion.

  1. We want all email addresses verified
  2. We want all users to login through IdPs - so that we don't need to send email verifications(?)
  3. We want a user to change emails

  4. If we login through IdPs, then IdPs will return a list of verified emails
  5. To keep all user email addresses verified, we should only users to choose from whatever IdPs return
  6. This breaks law 3

  7. Alternatively, if we let the user input whatever email addresses in searcch profile, then the email are unverified
  8. if an email is unverified, then we cannot use it as a login credential
  9. Therefore, if we let the user to change their email addresses, then the change is "superficial", or we need to have a mail server sending verifications.
  10. This breaks law 2

Both solutions seem come down to some tradeoffs. So I'm quite confused on which path to take. Please let me know if I miss anything!

lauratinnel commented 1 year ago

Here's a use case for #155 to help understand what I think needs to be addressed:

Laura has a github.com account that uses laura.tinnel@sri.com as the primary email account. Laura uses her github.com account to establish an account and log into the searcch hub. Laura imports a bunch of her artifacts and becomes an active part of the SEARCCH community. Laura decides to accept a job offer from another organization and gets a new email address. She changes her email address on github.com to laura@mydreamjob.com Laura logs into searcch using her github.com account. Laura continues to maintain ownership of all her artifact entries in the searcch hub and all her activity is correctly associated with her, rather than a dead email address (laura.tinnel@sri.com)

HOW you address this is up to you guys. :)

ckouder commented 1 year ago

Thank you @lauratinnel for the clarification. I think I misunderstand the focus ... we in fact want the user to bind their searcch email to IdP emails so that the searcch emails are always validated even if the user changes IdP emails. In that case, I think @carboxylman's solution is pretty good ..

We only trust that the IdP has verifed whatever email it gives us each time the user logs in, so my thinking is that we just update their searcch account if the IdP gives us a new one. We no longer know that the old email address is valid/verified, so we have to move to the current one.

This does assume we have a new table that maps IdP UIDs to searcch User table, and that we can do a secondary match on on IdP UID, if email address doesn't match a current User in the searcch database. Given this, we shouldn't need to keep a backup email addr; the "current" email addr suffices.

Going back to my example, let's say that that new email addr they had when they later logged in with github was bar. Then, if they still had foo as their google email addr, and if they kept switching between searcch logins with google and github, we would happily keep updating their address to whatever the IdP sent. 😆 That is ok with me.

ckouder commented 1 year ago

close the PR. issue is addressed in https://github.com/ITI/searcch-backend/pull/96