Open mrtamm opened 2 months ago
I believe, client keys should be ad-hoc generated by the funnel.
Initial development is here: https://github.com/mrtamm/funnel-gdi/tree/dev-htsget-crypt4gh
At the moment, I still need to do more full-scale testing (and potentially fixing) before reaching a PR. So I'm estimating May 8 for the PR.
From slack:
"inputs": [
{
"name": "pub key input",
"description": "Public C4GH key.",
"type": "FILE",
"path": "/tmp/c4gh.pub",
"content": "PUBKEY AS STRNG"
}
],
HTSGET storage configuration in Funnel now looks like this:
HTSGETStorage:
Disabled: false
Protocol: https
SendPublicKey: false
When SendPublicKey
is true
, Funnel will generate the key-pair if existing keys (files) are not found. Funnel itself cannot detect if the Htsget server sends the data encrypted or not. So user must specify it explicitly.
Protocol
specifies the replacement protocol for calling HTSGET API (default is https
).
Overview about the local testing setup.
Inside htsget-rs directory:
cp deploy/Dockerfile .
docker build -t ghcr.io/umccr/htsget-rs:latest .
formatting_style = "Compact"
# The main ticket-server:
ticket_server_addr = "0.0.0.0:8080"
# The local-data-server:
data_server_enabled = true
data_server_local_path = "/data" # This is INSIDE the container
[[resolvers]]
[resolvers.storage]
response_url = "http://localhost:9091/"
forward_headers = true
[resolvers.storage.endpoints]
file = "http://localhost:8081/"
index = "http://localhost:8081/"
[resolvers.object_type]
send_encrypted_to_client = true
private_key = "/crypt4gh/private.key"
public_key = "/crypt4gh/public.key"
./htsget/
- crypt4gh/
- private.key
- public.key
- data/
- test_data.vcf.gz.c4gh
- test_data.vcf.gz.tbi
- htsget.toml
Generate private and public keys using command:
crypt4gh-keygen -f --nocrypt --sk private.key --pk public.key
Sample VCF for testing: https://github.com/EGA-archive/beacon2-ri-tools/blob/main/test/test_1000G.vcf.gz
Generate index (TBI) for the VCF: bcftools index -t test_data.vcf.gz
services:
htsget:
container_name: htsget
image: ghcr.io/umccr/htsget-rs:latest
command: htsget-actix --config /etc/htsget.toml
ports:
- "9090:8080"
- "9091:8081"
volumes:
- "./htsget/data:/data:ro"
- "./htsget/crypt4gh:/crypt4gh:ro"
- "./htsget/htsget.toml:/etc/htsget.toml:ro"
After docker compose up
, call the API (for testing):
curl -H 'client-public-key: Qjn...' 'http://localhost:9090/variants/test_1000G?class=header'
Copy config/default-config.yaml to my-config.yaml and modify HTSGETStorage
:
HTSGETStorage:
Disabled: false
Protocol: http
SendPublicKey: true
# copy the keys:
cp htsget/crypt4gh/private.key .private.key
cp htsget/crypt4gh/public.key .public.key
go run . storage get "htsget://localhost:9090/variants/test_data?class=header" header.vcf.gz -c my-config.yaml
First of all, really nice that you are implementing support for htsget! I'm testing this implementation together with starter-kit-htsget + starter-kit-storage-and-interfaces, and have two questions:
Thanks :)
Hi and thank you for the feedback!
I am not able to use a private key that uses a passphrase, even if the passphrase is empty. Is it possible?
As shown above, I used --nocrypt
option to generate the keys without a passphrase. So it should work in that case. However, if the environment, where funnel is running, contains an environment variable C4GH_PASSPHRASE
, crypt4gh would use that value for decrypting the key. At the moment, this is the only way to make it work. Theoretically, it would be possible to add this passphrase to funnel configuration file, too.
Would it not be safer/sounder to decrypt the file inside of the execution container, instead of first decrypting and then copying the decrypted file to the container?
It depends. If it has to be done in the container, this (additional) task would be left to the container developer. However, the private key is already in the host system, so this decryption could be executed outside of the container as well. For the sake of user experience, I decided to decrypt the file beforehand, and leave the security task for the maintainer of the host system (where funnel is running).
This is how I figured it out how it would work best but if there are more ways to solve it, I would gladly discuss them.
Thanks for the answers, @mrtamm ! I think your reasoning makes sense, and I now have the complete setup running :+1: .
A side note, in case someone else finds it useful: the htsget command (cmd1
) might hang, if there is something wrong with the decryption (cmd2
) so that it stops reading from the pipe. For example, if an old version of crypt4gh
is used.
Thanks for the feedback! I need to check, indeed, how the problems could be detected when something goes wrong with the commands. Secondly, I'm also considering support for other crypt4gh implementations (they have different CLI flags), or otherwise integrating decryption to the Funnel source code. Estimating this to be ready by the end of June.
Add support for requesting genomic data in encrypted (crypt4gh) format.
Htsget (more specifically htsget-rs) is supposed to support this functionality, as described here: https://github.com/umccr/htsget-rs/blob/194457b077d3387414800fd5ffcb2a2141a6d1b3/docs/crypt4gh/ARCHITECTURE.md
Funnel needs to implement the referred htsget protocol for downloading encrypted files.
This means extending the current htsget protocol implementation: