coiled / feedback

A place to provide Coiled feedback
14 stars 3 forks source link

[UX] Downloading logs gives empty file if there are too many logs #228

Closed fjetter closed 1 year ago

fjetter commented 1 year ago

If a VM accumulated too many logs it appears that our download does not work as intended. Instead of getting the logs or an error message, the download gives an empty file

For instance, the scheduler of this cluster https://cloud.coiled.io/dask-engineering/clusters/144096/details is empty.

However, this cluster has plenty of logs, in fact it was running on debug logs and I get about 400k lines when fetching it with the CLI.

Instead of getting an empty file, a warning / error should be displayed telling me that the volume is too large to download. Possibly suggesting the coiled logs CLI as an alternative.

shughes-uk commented 1 year ago

What's the priority on being able to actually download the full log file in the UI? I was thinking with the UI we'd head to a datadog style logs search interface, and maybe direct people to the CLI if they need to dump the entire log to a file.

mrocklin commented 1 year ago

Fwiw I like the file download UX. I'm happy using my text editor to search through logs.

On Wed, Jan 11, 2023, 4:58 PM Samantha Hughes @.***> wrote:

What's the priority on being able to actually download the full log file in the UI? I was thinking with the UI we'd head to a datadog style logs search interface, and maybe direct people to the CLI if they need to dump the entire log to a file.

— Reply to this email directly, view it on GitHub https://github.com/coiled/feedback/issues/228#issuecomment-1379673057, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTCJRZ2QENWEZRV46KTWR5JJ3ANCNFSM6AAAAAATXZF5KI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

fjetter commented 1 year ago

What's the priority on being able to actually download the full log file in the UI?

Low to Medium, I would say. I suggest to not "fix" this by enabling large file downloads but instead just show an error or warning instead of an empty file download if that's possible. My gut feeling is that having ~400k rows of logs is a rare thing and we do not need to be excellent here. However, I don't truly know where the cutoff is and when this stops working (I guess this is related to some timeout?)

FWIW I can get all the logs easily with the CLI

coiled cluster logs 144096 --scheduler --format=short > scheduler.log

Maybe a simple UX fix would also be to add a "Copy to clipboard" button next to the download button with this command?

Fwiw I like the file download UX. I'm happy using my text editor to search

I think most log management tools allow downloading, e.g. as CSV. I agree this is valuable at times

ntabris commented 1 year ago

The backend code will now try to use a different AWS request to pull logs that works better, especially for pulling large logs or logs from many instances.

This requires an additional permission, logs:FilterLogEvents. We now include that permission in our docs and in the coiled setup aws script, but it wasn't previously included. Going forward this means new users will have it, but it might not be present for old accounts, and the backend code will pull logs the old way if the new permission isn't present.

I've added the permission to the AWS IAM Role used by dask-engineering, and now the scheduler logs for that cluster download almost immediately. When I initially tried downloading them (with the old permission/method) it timed out.

This account had a non-standard IAM Role name (it had presumably been set up manually a long time ago), so I had to go into the AWS Console to find that the role name was iam_ongoing_policy. Once I did that, I used our CLI to update the permissions, like so:

coiled setup aws --profile oss --update-policies --ongoing-policy iam_ongoing_policy

Future work

  1. We'll want to nudge people to update the permissions. The coiled cluster better-logs CLI already tells you to do this (and tells you the CLI command) if you try to pull logs for a cluster that has more than 4 workers.
  2. I'll continue to polish (and document) the coiled cluster better-logs CLI (which will probably be renamed to just coiled cluster logs at some point) but I think it's pretty nice already.
  3. I think we do want logs to show in the web app, and to see them correlated with metrics.