CatchTheTornado / doctor-dok

Doctor Dok is an AI based medical data framework and patient's med vault. Parse any health related PDF/Image to JSON and then use Chat GPT / LLama to discuss it! WARNING: Don't decide on your health based on AI Chat - it's just for Research purposes.
http://doctordok.com
MIT License
54 stars 10 forks source link

[rfc] Application Security Architecture #65

Open pkarw opened 3 months ago

pkarw commented 3 months ago

Terms:

We're using double leveled structure (User Key / Sharing Key) -> (Master Key) for letting the Users to manage/change Sharing and User Keys without the need to re-encrypt whole database.

Database Structure:

{  "salt": "salt",
    "time": 1, // the number of iterations
    "mem": 1024, // used memory, in KiB
    "hashLen": 24, // desired hash length
    "parallelism": 1, // desired parallelism (it won't be computed in parallel, however)
}

Each databaseIdHash is associated with a at least single User Key and can have multiple Sharing Keys. Each Sharing Key record holds an encrypted copy of the Master Key.

Zero Trust Because we never store plain User Keys, Shared Keys neither Master Keys - we're in Zero Trust security. The hashes (being sent to server) and the data encryption/decryption (done only on client) are only possible with given and stored inside browser's memory User Key and Master Keys.

UC01: Creating a New User Key or Database:

User provides us with:

Endpoint: /db/create

Data sent to server:

{  "salt": "salt",
    "time": 1, // the number of iterations
    "mem": 1024, // used memory, in KiB
    "hashLen": 24, // desired hash length
    "parallelism": 1, // desired parallelism (it won't be computed in parallel, however)
}

In case this is User Key - the first key in the database we are also generating and encrypting with AES-GCM the masterDataKey. If this is another key request - we're just sending the existing key along.

Hashes are calculated Client Side. No plain data sent to server.

For every API request we will set the Authorization header contains base64(JWT Access Token). The server verifies this as described below later in this text.

Server side:

UC02: Log In

  1. The user inputs their Database Id.
  2. The user inputs their User Key or Sharing Key.
  3. Application is requesting /db/authorize-challenge with posting:
    • databaseIdHash: sha256(Database Id)
    • keyLocatorHash: sha256(User Key + Database Id + static salt) - this is just the record locator
  4. Server sents back keyHashParams associated with the databasIdHash and keyLocatorHash' used as record locators insidekeys` table
  5. Client calculates argon2id hash for given keyHashParams and User Key and sends it to the server as keyHash with a request to /db/authorize
  6. Server checks if argon hash associated with the key located by keylocatorHash and databaseIdHash matches keyHash.
  7. Note: After succesfull login, server responds with JWT Access Token and JWT Refresh Token. These details are stored in the browser's memory and sent with each API request.
  8. In case of succesfull login, The browser receives an AES-encrypted Master Key for the data, which is decrypted only in the browser using the User Key or Sharing Key stored in memory browser and unavailable to the server.
  9. Data is encrypted using AES-GCM with this master key.

Authorization algorithm (on the server):

  1. Get the keys records (one or many) where databaseIdHash = + databaseIdHash + and keyLocatorHash = + keyLocatorHash (sent from client)
  2. Iterate over the keys assigned to specific databaseHash and check if argon2id.check(keyHash, serverStoredKey.keyHash) are matched.
  3. If so, send the encryptedMasterKey + JWT token to the client.

UC03: API Request Handling by the Server

  1. Server verifies JWT Access token sent in the x-access-token header.
  2. The server searches for the key in the database using the keyHash (hashed Database ID).
  3. keyHash and databaseIdHash are stored within JWT Token described above.

Alternatively:

  1. Server retrieves all keys for the tenant (there can be multiple if shared, or one if not). It iterates through and checks argon2id (of the submitted client key hash). If matched, the user is authorized

Client-Side Encryption and Decryption: All data encryption and decryption are handled on the client side, ensuring data security.

The server cannot decrypt client data as it never receives the user's key—only the SHA-256 hash of the key, which is necessary to decrypt the master key used for data encryption.

pkarw commented 3 months ago

Implemented with #66