DataBiosphere / azul

Metadata indexer and query service used for AnVIL, HCA, LungMAP, and CGP
Apache License 2.0
7 stars 2 forks source link

Create AWS account and bucket for log backups #4314

Open hannes-ucsc opened 2 years ago

hannes-ucsc commented 2 years ago

https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/S3Export.html

The literals below use Bash conventions.

For each AWS account used by the system, there is a separate AWS account serving as the vault.

The name of the account is vault-$account where account is the name of the served account.

In each vault account there is one vault bucket. The name of the vault bucket is edu-ucsc-gi-vault-$account.

In each bucket there is a directory per Azul deployment in the served account. The name of the directory is the name of the deployment.

In each directory there is a subdirectory per archived resource in that deployment.

For this ticket, the archived resource is azul-cloudwatch-logs.

So for example, the CloudWatch logs for the HCA production deployment go to

s3://edu-ucsc-gi-vault-platform-hca-prod/prod/azul-cloudwatch-logs

That bucket will be owned by the AWS account called platform-hca-prod-vault.

The CloudWatch logs for the AnVIL development deployment will go to

s3://edu-ucsc-gi-vault-platform-anvil-dev/anvildev/azul-cloudwatch-logs

For another example, the CloudWatch logs in Noah's personal deployments go to

s3://edu-ucsc-gi-vault-platform-hca-dev/nadove2/azul-cloudwatch-logs s3://edu-ucsc-gi-vault-platform-hca-dev/nadove3/azul-cloudwatch-logs s3://edu-ucsc-gi-vault-platform-anvil-dev/nadove4/azul-cloudwatch-logs s3://edu-ucsc-gi-vault-platform-anvil-dev/nadove5/azul-cloudwatch-logs

The first two (latter two) directories are in the vault bucket owned by AWS account vault-platform-hca-dev (vault-platform-hca-prod).

The fact that deployments are mapped to directories in the vault bucket, as opposed to each deployment having its own vault bucket, means that the archived resources can be created right away, without the need to first create a vault bucket when the deployment is created. This means that deployments can be created and destroyed without involvement of the organization administrator. Production and development are separated for compliance reasons. AnVIL and HCA are separated for billing reasons.

The team and the service accounts used by the team can only create new objects in vault buckets. They can't read, list, delete or overwrite objects. If overwriting can't be prevented via bucket access policy, then we need to enable versioning on the bucket.

Data Browser System Overview - Logging   Monitoring

theathorn commented 2 years ago

Does AWS already store log data redundantly, in which case we don't need to do anything extra?

hannes-ucsc commented 2 years ago

It does, but generally not in a geographically different location, which I assume is required for this control.

nolunwa-ucsc commented 1 year ago

The goal of this control is to store audit records on separate physical systems or components and also preserve the confidentiality and integrity of audit records. Question: is there a way to have physical separation within AWS, the goal from an attack standpoint is if the Data browser gets compromised the audit logs are preserved from any unauthorized access and modification. AWS recommended Logical Separation Compared to Physical Separation, can we have a host and instance isolated for this purposed and managed by Eric's team? https://docs.aws.amazon.com/whitepapers/latest/logical-separation/drivers-for-physical-separation-requirements.html

hannes-ucsc commented 1 year ago

AWS recommended Logical Separation Compared to Physical Separation, can we have a host and instance isolated for this purposed and managed by Eric's team?

@nolunwa, the design specified in the ticket description achieves that.