Watts-Lab / deliberation-empirica

Empirica V2 framework
MIT License
6 stars 0 forks source link

Set up S3 Bucket for streaming deliberation videos #272

Closed JamesPHoughton closed 1 year ago

JamesPHoughton commented 1 year ago

We will need an S3 bucket that we can put recordings into. We won't be accessing the videos very frequently, at least at first - just to check them for QC. Eventually we'll want to run automated analysis against them.

I don't know much about how this works, so I have a few questions:

  1. Are there different storage tiers with different costs, eg. long duration, high availability, etc.? Which do we want?
  2. How do I access the bucket? Is this through the AWS console, logging in as my user?
  3. How do we set who within our lab can access the bucket?
  4. How do we give Daily permission to write to the bucket? How does Daily know where to write?
  5. How do we give other researchers access to the bucket?
  6. How do we give our analysis programs access to the bucket?
  7. How much does it actually cost to store videos?

@rivera-lanasm

rivera-lanasm commented 1 year ago

Are there different storage tiers with different costs, eg. long duration, high availability, etc.? Which do we want?

How do I access the bucket? Is this through the AWS console, logging in as my user?

How do we set who within our lab can access the bucket?

How do we give Daily permission to write to the bucket? How does Daily know where to write?

How do we give other researchers access to the bucket? How do we give our analysis programs access to the bucket?

How much does it actually cost to store videos?

Other thoughts In my experience, it's very important to map out how you expect the data to enter S3 and be utilized by users as soon as possible. The main things to keep in mind are:

  1. Writing data to S3 in such a way it is easy to search for by downstream processes. This means using S3 partitions well.
    • partitions
  2. Avoid large number of small files where possible. API calls are charged per object, regardless of its size. Uploading 1-byte costs the same as uploading 1GB.
  3. S3 is a key-value store, not a file system
  4. Think early on about how to archive data you don't plan to use for the foreseeable future, but would still like to have. Collecting files with zip for example.
Alan-Qiao commented 1 year ago

This is documentation from daily on recording to custom s3 bucket. https://docs.daily.co/guides/products/live-streaming-recording/storing-recordings-in-a-custom-s3-bucket

JamesPHoughton commented 1 year ago

Thanks @rivera-lanasm, this is a really helpful breakdown. Can you help me set up the following?

S3 bucket

IAM policy

Daily has some requirements for access permissions, using this policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:ListBucketMultipartUploads",
        "s3:AbortMultipartUpload",
        "s3:ListBucketVersions",
        "s3:ListBucket",
        "s3:GetObjectVersion",
        "s3:ListMultipartUploadParts"
      ],
      "Resource": [
        "arn:aws:s3:::your-bucket-name",
        "arn:aws:s3:::your-bucket-name/*"
      ]
    }
  ]
}

IAM role

Daily docs Trusted Entity Type: AWS Account Trusted AWS account ID: 291871421005 Required external ID: deliberation Maximum session duration: 12 hours

We'll need the Amazon Resource Name (ARN) for the role, and the bucket region to include in the daily API calls.

JamesPHoughton commented 1 year ago

From @rivera-lanasm:

Setting up the bucket

Creating IAM policy

Setting up the role:



## Questions for @rivera-lanasm 

- [ ] Should we use [default encryption of videos](https://docs.aws.amazon.com/AmazonS3/latest/userguide/default-bucket-encryption.html)? What friction does it introduce, and what additional security does it buy beyond the existing permission set? Does it cost money?
- [ ] Is it a good idea to use "object lock" for our videos?
- [ ] How do we set up the intelligent tiering? Or is that enabled by default? There is a setting for an archive configuration, but that seems to be for the asynchronous access options.
- [ ] Do these ARN's need to be kept secret, or can they be committed to a public repository

## Todo:

- [ ] Delete old bucket `wattslab-deliberation-eyeson` that we are not using any more
- [ ] Check in with daily about why they suggest versioning, and what we expect it to cost.
rivera-lanasm commented 1 year ago
JamesPHoughton commented 1 year ago

After reaching out to daily to see why it wasn't working, they replied with:

On a previous case where we've seen the same error occur, this has been solved by increasing the maximum session duration to 12 hours

I've updated the session length to 12 hours.