elixir-cloud-aai / proTES

Proxy service for injecting middleware into GA4GH TES requests
Apache License 2.0
5 stars 6 forks source link

Middleware for Crypt4GH Support #170

Open athith-g opened 3 months ago

athith-g commented 3 months ago

Problem Crypt4GH is a file format developed by GA4GH that allows sensitive genomic data to remain encrypted at rest and in transit. Currently, TES implementations do not support the use of crypt4GH files as inputs.

Solution I will be developing middleware for proTES that enables the use of crypt4GH files without the user having to alter the initial TES request. The middleware should detect the presence of a crypt4GH file and alter the initial request such that a decryption step is included in the task.

The decryption step is the addition of an executor that decrypts the crypt4GH file and temporarily places the decrypted file in a volume. Essentially, the executor:

  1. Generates an ephemeral key pair
  2. Fetches the crypt4GH file with the ephemeral public key
  3. Decrypts the file with the ephemeral private key
  4. Places the decrypted file in a volume.
Screenshot 2024-04-01 at 10 39 17 PM Screenshot 2024-04-01 at 11 10 47 PM

The diagram on the left describes a workflow without a crypt4GH file as input. The diagram on the right describes a workflow with a crypt4GH file as input.

Possible Alternative Approach Rather than generating a new decryption executor in each TES instance that utilizes a crypt4GH file, the proTES middleware can decrypt the files itself and store them temporarily in some repository. The TES instances can then read the file from this repository without having to utilize a decryption executor. This approach would use less compute (since files are only decrypted once total rather than once per TES instance) and may be less complex to implement. However, the downsides of this approach are that the decrypted data would be centralized and data would be decrypted in transit (when each instance fetches that data from the repository), making this approach less secure.

uniqueg commented 3 months ago

This is brilliant, thanks a lot @athith-g.

The alternative approach would be useful only in a situtation where proTES and all TES endpoints are part of the same organization with the same security architecture. I guess it really is too limiting. Security is expensive :man_shrugging: