TileDB-Inc / TileDB-VCF

Efficient variant-call data storage and retrieval library using the TileDB storage library.
https://tiledb-inc.github.io/TileDB-VCF/
MIT License
87 stars 14 forks source link

using AES encryption #739

Closed lynnjo closed 2 months ago

lynnjo commented 3 months ago

Your documentation has a brief section discussing encryption using AES-256 in GCM mode (https://docs.tiledb.com/main/background/internal-mechanics/encryption). Is there an example or documentation on how this would be setup for tiledb? Thank you

gspowley commented 3 months ago

Hi @lynnjo,

Here's a python example of creating a TileDB-VCF dataset with AES-256 GCM encryption.

import tiledbvcf

ds_uri = "vcf.tdb"

config = {
    "sm.encryption_type": "AES_256_GCM",
    "sm.encryption_key":"0123456789abcdef0123456789abcdef" # 32 bytes required
}

ds = tiledbvcf.Dataset(ds_uri, mode="w", tiledb_config=config)
ds.create_dataset()

When reading an encrypted dataset, a config with the encryption key must be provided.

import tiledbvcf

ds_uri = "vcf.tdb"

config = {
    "sm.encryption_type": "AES_256_GCM",
    "sm.encryption_key":"0123456789abcdef0123456789abcdef" # 32 bytes required
}

ds = tiledbvcf.Dataset(ds_uri, tiledb_config=config)
print(f"Found {len(ds.samples())} samples")

The following --tiledb-config option provides the same functionality with the TileDB-VCF CLI.

--tiledb-config sm.encryption_type=AES_256_GCM,sm.encryption_key=0123456789abcdef0123456789abcdef
lynnjo commented 3 months ago

Thank you !