authgear / authgear-server

Open source alternative to Auth0 / Firebase Auth
https://www.authgear.com
Apache License 2.0
81 stars 37 forks source link

Separate user data residency for data privacy law compliance #3172

Open fungc-io opened 1 year ago

fungc-io commented 1 year ago

Problem

In some countries, there are geographical limit on the registered server used to store the user data.

For example in China, the Personal Information Protection Law (PIPL) demand that the personal information of a Chinese Citizen including the mobile number the user used to log in should be stored in a server registered in China.

For an application that serves customers across countries, this means that according to a enduser’s identity

Research

In this section, we will look at how some MNCs approach this problem

NIKE International

Apple

Solution

Overview

Let’s say a user want to serve both customers from both China and other countries.

For PIPL compliance, the user can set up two servers in China and, for example, HK respectively

image

Note

If the Chinese users are expected to use the app from outside China, their network speed may be affected by the Chinese network. A tunnel is recommended communicate to the China server.

Benefits of this approach

Support in Authgear

Rabbit Holes

fungc-io commented 1 year ago

Hi @chpapa,

I've created the pitch based on our previous discussions few months ago. With elaboration in the "Solution" part, trying to give the solution more concrete shape.

What do you think about it?

One thing i'm not sure is the "redirection rules" under Support in Authgear > Portal, not sure if we should allow configuring these rules in Portal or only in custom defined workflows

chpapa commented 1 year ago

@fungc-io

A few thoughts on the concept/directions:

  1. Maybe we can also think about if the address (country) field in the standard attributes could be useful at determining the location.
  2. I think we should highlight unless a 2-phase-commit-like protocol is implemented, otherwise it is possible that we will have race conditions between sign up, and there will be multiple users of identical login ID created in multiple regions. But a 2-phase-commit will significantly degrade signup throughput if it has to be coordinated between multiple region servers.
  3. Not quite sure about redirection rules yet... but I guess it is probably limited to the identify step; We do need to think about how passkey, oauth etc could be compatible with this?
  4. Although it seems referencing other implementations, having multiple endpoint for multiple region seems the way to go, yet to complete our research and make sure, maybe we should still explore the idea of, for example, doing the redirection at API gateway.
fungc-io commented 1 year ago
  1. It can be a redirection rules if the signup process include filling the profile? If the user's country is China, block the signup and redirect to the China server. If the user changes country later, the developer can either forbid this action, or run the migration API later

  2. Let me put in this into Rabbit Hole

  3. If the application uses other authenticators like passkey/oauth, the application should determine the user's region first before triggering the authentication. Authgear cannot redirect the user based on that.


  1. Redirection at API gateway meaning the applications will always connect to the same endpoint?
    • Base on some criteria set in Authgear, the user will be created in either server.
    • This requires a centralized db to store the mapping of a user to their corresponding server location
    • For every requests, Authgear will get the user profile and data from their location image
louischan-oursky commented 1 year ago

After the product meeting on 2023-08-02, I read skimmed through GDPR and PIPL.

Pseudonymized data is still considered as personal data by GDPR. Therefore even if we store pseudonymized identifiers (like email addresses and phone numbers) in a centralized database, this approach could violate the law.

Does this mean we can rule out the option of having a centralized database to ensure identities (e.g. email addresses and phone numbers) are unique across all regions for a given project?

Given that two-phase-commit protocol is complex and can degrade the performance, I think we can also rule out this option.

Then, can we conclude that identifier uniqueness is something that we need give up?

Since we do not have a centralized database, all servers in different regions must be known to each other. They have to ask each other if an identifier already exists in another region. We need to design a protocol for the servers in different regions to communicate.

tung2744 commented 1 year ago

Trying out cockroach db

Findings

ALTER TABLE {table} SET LOCALITY REGIONAL BY ROW;



### App server
- Multi app server (TBC)
- Multi redis instances (TBC)