Container Log archive retrieval

UserNotFound commented 2 years ago

Download logs from S3, by providing an app, database, proxy, or container ID. Since the S3 Archive feature is meant as a compliance feature, this tool for downloading logs differs from the UX of other commands:

Does not accept handles for resources - you can retrieve logs even for deleted/recreated resources
Does not rely upon our API at all, but instead directly talks to S3 - former customers can still use it

$ aptible help logs_from_archive
Usage:
  aptible logs_from_archive --bucket NAME --region REGION --stack NAME --decryption-keys ONE [OR MORE] --download-location [ [ --string-matches ONE [OR MORE] ] | [ --app-id ID | --database-id ID | --endpoint-id ID | --container-id ID ] [ --start-date YYYY-MM-DD --end-date YYYY-MM-DD ] ] --bucket=BUCKET --decryption-keys=one two three --region=REGION --stack=STACK

Options:
  --region=REGION                          # The AWS region your S3 bucket resides in
  --bucket=BUCKET                          # The name of your S3 bucket
  --stack=STACK                            # The name of the Stack to download logs from
  --decryption-keys=one two three          # The Aptible-provided keys for decryption. (Space separated if multiple)
  [--string-matches=one two three]         # The strings to match in log file names.(Space separated if multiple)
  [--app-id=N]                             # The Application ID to download logs for.
  [--database-id=N]                        # The Database ID to download logs for.
  [--endpoint-id=N]                        # The Endpoint ID to download logs for.
  [--container-id=CONTAINER_ID]            # The container ID to download logs for
  [--start-date=START_DATE]                # Get logs starting from this (UTC) date (format: YYYY-MM-DD)
  [--end-date=END_DATE]                    # Get logs before this (UTC) date (format: YYYY-MM-DD)
  [--download-location=DOWNLOAD_LOCATION]  # The local path place downloaded log files. If you do not set this option, the file names will be shown, but not downloaded.

Retrieves container logs from an S3 archive in your own AWS account. You must provide your AWS credentials via the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY

Here's a function we can use for pulling logs ourselves, and also facilitates easier testing.

function stack-logs() {
  `omnivault exec chamber env $STACK/sweetness/settings`
  omnivault exec bundle exec bin/aptible logs_from_archive --stack "$STACK" --region "$REGION" --bucket "$DOCKER_LOGS_BUCKET_NAME" --decryption-keys "$LOG_ENCRYPTION_KEYS" "$@"
}

export STACK='test-alex'
export REGION='us-west-1'

Now, use the string match option to list all files matching the v3 log file name specification:

stack-logs --string-match v3

From there, you should have an idea of how to use the other options to find exactly what you're looking for.

My thoughts:

it should be possible to support providing other methods of providing AWS credentials (ie ~/.aws/credentials) but I chose providing the keypair via the environment because it's simple to document and handle errors
--string-matches was the first thing I wrote, basically as a way to explore the files in s3. I'm not :100: that it is required, but I think it's a good backup since it can be used to retrieve any file, not just those using the app-id/database-id/etc options
we should use this internally as a replacement for pancake stack:logs, so that we dogfood it for customers

madhuravius commented 2 years ago

I left a small assortment of review comments, but the thing that jumped out to me the most is that it SEEMS like the only dates ever used are in UTC (or GMT/UTC+0), they are always in the same format and/or same length.

i echoed the below in this comment as well

we do not need to employ any time comparisons but can just use string sorting/string comparisons (i think the real word for it is lexicographical sorting of dates in X format?) without any date conversions if this is the case. date conversions not only add additional places things can break (parsing/comparisons/lots of logic) but also compute time (though it's fast). it's one of those things that also makes tests simpler/easier to read, but glancing through the unit tests for the s3 files and various data, i only saw one type/zone/length of date (perhaps i was mistaken). this only applies IF we use the same and consistent date format (which I think we do?)

if i'm wrong on the above guess about how the dates are used, then this comment is inaccurate and can be disregarded

edit - just left a comment clarifying that we could use string comparisons (dont think we really use sorts here)

UserNotFound commented 2 years ago

Addressing all feedback, plus adding validation that you provide a full 64 char container ID if you're using --container-id.

fancyremarker commented 2 years ago

@UserNotFound : what's the plan for when the API does return encryption keys? At that point, we should update the CLI to support finding resources (a) by handle, not ID, and (b) without passing an encryption key. It is more ergonomic/easier and more consistent with the rest of the CLI.

I see why we also need the current semantics, but I do not think this should or will be the default use case.

Do you think we'd have a separate command, or just more argument options for this command?

fancyremarker commented 2 years ago

@UserNotFound : another minor point: I was mildly surprised by the behavior of downloading nothing if --download-location is unspecified. (I was imagining it would download to a folder in the current directory OR /tmp.) This is very minor, and I think it's fine as is, especially given the clear output you provide:

No files were downloaded. Please provide a location with --download-location to download the files.

fancyremarker commented 2 years ago

Another thing:

  [--container-id=CONTAINER_ID]            # The full (64 char) container ID to download logs for

How will customers have this? Don't we only send them the 12-character prefix?

Apart from these comments, this looks awesome overall! Very excited to have this. Great work on the implementation.

UserNotFound commented 2 years ago

Thanks for the feedback @fancyremarker. Overall, this was written in a vacuum of hard requirements with an eye towards showing basic functionality and showing what is possible.

For example, there's no specific command to download logs of an ephemeral session. When I started, there was no way for a customer to know the Operation ID of the session, which is needed to identify the files. Now that the Operation ID's are show in the UI, it would be possible to do so. Should that be included now, or in a future CLI update?

  [--container-id=CONTAINER_ID]            # The full (64 char) container ID to download logs for
How will customers have this? Don't we only send them the 12-character prefix?

Good question - i suppose it would make sense to match the user-provided value with the beginning of of the container ID. I suppose the first 12 would be "close enough" to uniquely identify a container.

@UserNotFound : another minor point: I was mildly surprised by the behavior of downloading nothing if --download-location is unspecified. (I was imagining it would download to a folder in the current directory OR /tmp.) This is very minor, and I think it's fine as is, especially given the clear output you provide:

No files were downloaded. Please provide a location with --download-location to download the files.

This was simply an expression of my opinion that we can't assume any specific location is safe for PHI or other secrets, and it prevents a footgun where the user is not aware of where they are downloaded to. I could go either way on it.

@UserNotFound : what's the plan for when the API does return encryption keys? At that point, we should update the CLI to support finding resources (a) by handle, not ID, and (b) without passing an encryption key. It is more ergonomic/easier and more consistent with the rest of the CLI.

I see why we also need the current semantics, but I do not think this should or will be the default use case.

Do you think we'd have a separate command, or just more argument options for this command?

My thoughts were branch the logic, if you have a valid token:

the --stack remains required
the --encryption-keys, --region, and --bucket options become optional, defaulting to the values retrieved from our API
additional options like --app-handle are un-gated, since they can do the API lookup to get the ID needed to retrieve from S3

Ultimately, though this brings up an interesting concern around access. While searching by handle is "easier" and more in line with other CLI commands, "authorization" is more complex:

what aptible role is granted access to the info in our API for the stack?
looking up resources by handle would be subject to the read permission/role for the current user
however, because the files are in the customer bucket, their access is actually subject to their own access controls

"Enclave Owner" might be a good role, so the TL;DR of what an owner can access is "everything on aptible" and "everything your aws credentials have access to".

fancyremarker commented 2 years ago

This was simply an expression of my opinion that we can't assume any specific location is safe for PHI or other secrets, and it prevents a footgun where the user is not aware of where they are downloaded to. I could go either way on it.

I think your rationale (forcing opt-in to downloading possible PHI locally) makes sense, and I'm happy to leave as is. The warning in this case is very clear.

"Enclave Owner" might be a good role, so the TL;DR of what an owner can access is "everything on aptible" and "everything your aws credentials have access to".

Yes, this is what I had in mind as well. As discussed, that does not need to happen in the scope of this PR. This appears complete for the current stage of release.

joshraker commented 2 years ago

README needs regenerated.

aptible / aptible-cli

Container Log archive retrieval #310