logs: enable application logs to a file (always)

anchore / grype

A vulnerability scanner for container images and filesystems

Apache License 2.0

8.5k stars 551 forks source link

logs: enable application logs to a file (always) #88

Closed alfredodeza closed 3 months ago

alfredodeza commented 4 years ago

We currently have a mix of nice, human-readable output like the download and check process when the tool starts and application logging that comes mostly in the form of errors. For example:

[0000] ERROR failed to catalog: could not fetch image 'foobar': unable to trace image save progress: unable to inspect image: Error: No such image: index.docker.io/library/foobar:latest

One of the problems that will happen when users face issues/bugs is that they will probably not run with high verbosity, and we will end up asking them to re-run with added verbosity flags (e.g. -vvv). This will complicate the reporting of the issue because the run will most likely be long gone, and the output no longer available.

Since the file logging is not enabled by default, asking for the log artifact will also not be a possibility.

As a reporter, having to re-run while trying to reproduce, is cumbersome to say the least.

Grype should enable file logging by default, and for a while after releasing, the default should be DEBUG. This highly verbose level should lower to INFO or WARNING as the tool stabilizes.

Enabling the file logging by default would also mean to do a "best effort" in trying to find a suitable locations, because /var/log/grype/2020-08-04.log might not be available or with enough permissions

In addition to all of this, it would be really useful to start separating what user-facing output is, vs. what system logging entails. More developer-like logs go to the file, and user-friendly messages go to the terminal

wagoodman commented 4 years ago

I think the motivation behind this makes sense, we should plan out how to make information already easily available without a re-run.

For logging locations, we have to make certain it works for both the use case of an individual and parallel runs in CI. If we go this route... /var/log is more for installed services and we'll run into permissions issues I think most of the time. I'd vote keeping logs in one of:

$XDG_CACHE_HOME (linux: ~/.cache , mac: ~/Library/Caches , win: %LOCALAPPDATA%\cache)
$XDG_DATA_HOME (linux: ~/.local/share, mac: ~/Library/Application Support, win: %LOCALAPPDATA%)
A new temp dir on each run (bonus: will be cleaned up automatically on boot, where the others we need to clean up ourselves)

Since syft and grype could expected to be run in CI we need to support parallel logs, which kind of implies making a random log dir for each run, reported to the terminal on each run (otherwise we need to coordinate writing to a single log for multiple processes). I realize that the syslog would take care of this concern, however, I think it's overkill for a CLI tool to write out to syslog and it will be much simpler not to.

We need to double check if we can do one log level to one destination [terminal], and another log level to another destination [file]... which I don't think we can. Currently the default level is at ERROR, and when it is raised to INFO or higher, the ETUI is disabled and logging is shown to the screen. If we have a default level of DEBUG, we'll need another CLI indicator to disable the ETUI (otherwise it would always be disabled).

luhring commented 4 years ago

it would be really useful to start separating what user-facing output is, vs. what system logging entails. More developer-like logs go to the file, and user-friendly messages go to the terminal

I agree with this 1000%. Right now, since our projects are in their infancy, we're using the logger as a means of reporting problems to users. We should grow out of this as soon as we can. Logging is a separate and distinct concern from user interfaces, even for CLI tools. Fortunately for us, I think @wagoodman 's work on our CLI UIs thus far is going to be a very nice foundation for evolving our work of interfacing with users.

Grype should enable file logging by default

I don't agree with this. I'm not 100% against it, but I'm probably 95% against it. Why? Here are my thoughts:

I don't expect CLI tools to be logging to my disk unless I explicitly ask them to. Furthermore, I don't expect CLI tools to be writing to my disk at all, unless either a) that's the whole point of the tool (e.g. git, vim) or b) I explicitly ask them to (e.g. curl -O, etc.).
None of the CLI tools I can think of that I commonly use write logs to disk. Not the simple built-in tools (e.g. grep, tar), not the more complicated built-in tools (e.g. top, vim), not OSS tools (e.g. terraform, npm). I do not believe logging to disk is a common, default expectation that users have.
There are reasons to have security concerns about a tool that writes to disk in unexpected places. As an example, macOS now prompts users with a GUI dialog when any app tries to access a new general area of the filesystem, for the whole reason of preventing apps from accidentally or maliciously reading or writing in unexpected ways. Since we are making "security" tools, I think we want to maintain trust from the community as much as possible, and I think writing files to disk that the user didn't ask for could raise eyebrows for some users.
For troubleshooting, I agree that we do need to figure out the expected process for the user. There's a Go convention that I've latched onto recently that helps you structure your code in such a way that segregates "expected" errors from "unexpected" errors, such that the unexpected errors can be reported to users as certain bugs. Given this, you can tailor how you present the user with information about the bug and how to report the bug. I don't hold the belief that asking the user to retry their command (hitting the up arrow in their terminal) and adding a -v flag is unnecessarily burdensome. In fact, if doing so cannot reproduce the issue, I would question whether it should be sent to us in the first place, since there's much less we could do to pinpoint the issue.

TL;DR:

Yes, we should get better at logging. Yes, we should report problems to users without using logging. No, we shouldn't write log data to user's systems without some form of consent from them. Yes, we should iron out our process for receiving and acting on reported problems from users.

alfredodeza commented 4 years ago

1. I don't expect CLI tools to be logging to my disk unless I explicitly ask them to. Furthermore, I don't expect CLI tools to be writing to my disk _at all_, unless either a) that's the whole point of the tool (e.g. `git`, `vim`) or b) I explicitly ask them to (e.g. `curl -O`, etc.).

Grype already writes to disk without asking for the users permission - this is exactly how it grabs the database and produces its caches. If we don't want these tools to write to the filesystem at all then we need a different discussion.

2. None of the CLI tools I can think of that I commonly use write logs to disk. Not the simple built-in tools (e.g. `grep`, `tar`), not the more complicated built-in tools (e.g. `top`, `vim`), not OSS tools (e.g. `terraform`, `npm`). I do not believe logging to disk is a common, default expectation that users have.

I strongly disagree here with this. Listing applications that don't write logs doesn't mean no application should ever write log output by default. For example, when there is a hard failure, tox writes to a log file:

$ tox
GLOB sdist-make: /private/tmp/jformat/setup.py
ERROR: invocation failed (exit code 1), logfile: /private/tmp/jformat/.tox/log/GLOB-0.log

Most of the Ceph tools like ceph-volume will try to write to a log file (by default /var/log/ceph/ceph-volume.log) and will note if they can't do it (a best effort behavior)

What I am trying to convey is that this is crucial for the first iterations of the tool as we will get issues because the tool will run in environments that were not tested at all, and without detailed information (enabled by default), it will be impractical/impossible to debug and comprehend.

3. There are reasons to have security concerns about a tool that writes to disk in unexpected places. As an example, macOS now prompts users with a GUI dialog when any app tries to access a new general area of the filesystem, for the whole reason of preventing apps from accidentally or maliciously reading or writing in unexpected ways. Since we are making "security" tools, I think we want to maintain trust from the community as much as possible, and I think writing files to disk that the user didn't ask for could raise eyebrows for some users.

Writing to disk is fine. If the problem is "unexpected places", let's figure out writing to expected places... I suggested /var/log/grype. There are numerous applications that write to /var/log/ by default without any sort of user configuration or granted permission.

4. For troubleshooting, I agree that we do need to figure out the expected process for the user. There's a Go convention that I've latched onto recently that helps you structure your code in such a way that segregates "expected" errors from "unexpected" errors, such that the unexpected errors can be reported to users as certain bugs. Given this, you can tailor how you present the user with information about the bug and how to report the bug. I don't hold the belief that asking the user to retry their command (hitting the up arrow in their terminal) and adding a `-v` flag is unnecessarily burdensome. In fact, if doing so cannot reproduce the issue, I would question whether it should be sent to us in the first place, since there's much less we could do to pinpoint the issue.

Grype is a tool that has to deal with many different aspects that can go wrong:

Network
Docker
Tar
Filesystem paths
Database(s)

It isn't a simplistic tool at all and I fully expect getting reports that are going to be hard to grasp with detailed logging output we currently do not have. The example of asking a user to rerun with -v is one of the many use cases I potentially see causing problems - and I agree it isn't necessarily cumbersome, however, it might be borderline impossible:

Can you re-run with higher verbosity in your vscode plugin?
You are working in a corporate environment that runs the tool and reports back, someone else controls the -v flag
This was in a CI run that happened last week when Docker was down and odd behavior happened.

The most complicated problems require detailed output that we must have enabled somehow or it will be impossible to determine how to solve them - especially so when it is not possible to just rerun with -v.

robertp commented 4 years ago

My 2 cents here, just being an observer for this project... opinions are my own, for informational purposes only, etc.

My expectation is that many of the CLI tools I use are going to scribble onto my disk by default. Not just into e.g. dot files in my home directory, but dot directories in my home directory. I suspect quite a few don't care about XDG specifications.

As an example:

rprince@x1anchore ~ ❗find . -maxdepth 1 -type d -name ".*"
.
./.yarn
./.thunderbird
./.pyenv
./.local
./.audacity-data
./.skaffold
./.gnupg
./.werf
./.vscode
./.vim
./.argocd
./.bundle
./.docker
./.circleci
./.gem
./.cache
./.multiwerf
./.npm
./.pki
./.mozilla
./.minikube
./.ssh
./.tox
./.zoom
./.config
./.grip
./.jupyter
./.snap
./.helm
./.aws
./.rvm
./.texlive2019
./.pylint.d
./.kube
./.kde

And:

rprince@x1anchore ~ ❗find . -maxdepth 1 -type f -name ".*"
./.NERDTreeBookmarks
./.viminfo
./.fonts.conf
./.gitmessage
./.bashrc
./.profile
./.tig_history
./.xsession-errors
./.wget-hsts
./.lesshst
./.face
./.gtkrc-2.0
./.python_history
./.sudo_as_admin_successful
./.bash_logout
./.bash_history
./.irb-history
./.zlogin
./.Xauthority
./.zshrc
./.mkshrc
./.boto

My opinion is that the friendliest thing we could do both for our users and ourselves is to have reasonable logging by default, and allow configuration and/or CLI args to modify that behavior.

wagoodman commented 4 years ago

I can appreciate the desire to improve the user experience when it comes to troubleshooting, however, I want to point out some important distinctions in the above examples:

Regarding /var/log: /var/log/ is by default not writable to anyone other than root in most (all?) linux distributions. I'm only aware of packages creating these directories and modifying sub-directory permissions at install type (e.g. during an RPM installation, as ceph-osd does [the provider of ceph-volume], which is always done as root) or by applications themselves running as root. Toolbox tools should be simple curl-and-run tools, which do not require root privileges. We do have deb/rpm packages, but shouldn't depend on those packages being used. It would be ideal to look for logs in one location, not several ("if you're on rhel/debian and installed via the deb/rpm look in /var/log/grype otherwise check in /other/dir..."). If the user decides to override the log location, that's OK, but default behavior should be easy and consistent.
On the theme of "do CLI tools write to disk": yes CLI tools often write to disk --there are also a class of CLI tools that do not. Just because tool X writes to disk doesn't automatically mean that we should too. I think we need to convince ourselves when it makes sense for the use case in question. The use case put forth in this issue is "should we always log by default to help users troubleshoot problems".
If we log by default, we own handling those logs. If it's in a local directory relative to invocation (like with tox) this seems simple enough --note that we will require users to add our local ./.grype/ scratch dir to their gitignore. If we decide to use a common directory for all invocations, then we will have relatively more responsibility on us (log rotation / cleanup, concurrent writes, etc) --we should try to avoid this unless we need to, as these fall outside of the curl-and-run requirements for this project.
Observations on grype behavior today: when running in a CI environment you automatically are getting info/error level events to the console (since there is no tty).
With regard to it would be really useful to start separating what user-facing output is vs. what system logging entails we already have an ETUI for user-facing feedback and a logger for system level-logging, with config and environment variables to write the logs to disk in a user-specified location. We should continue to evolve the output of the ETUI to handle all forms of output (errors are not covered today, and falls back to a logger interface... this topic is for another existing issue).

Given the above points, I want to put forth some opinions on this:

I think logs for troubleshooting is awesome and helpful. I don't think the logs are required to be persisted on disk per-say.
When running in your terminal (with a tty) you have the power to change the output to a logging format with -vv pretty simply. This means that we're really discussing users that are running in CI, which already default to a log format to the console (no -v necessary).

With the above opinions, I feel that the discussion should really steer closer to "what should the default log level be" since a user running this without a tty already has logs directed to the console.

alfredodeza commented 4 years ago

1. Regarding `/var/log`: `/var/log/` is by default not writable to anyone other than root in most (all?) linux distributions. I'm only aware of packages creating these directories and modifying sub-directory permissions at install type (e.g. during an RPM installation, as `ceph-osd` does [the provider of `ceph-volume`], which is always done as root) or by applications themselves running as root. Toolbox tools should be simple `curl`-and-run tools, which do not require root privileges. We do have deb/rpm packages, but shouldn't depend on those packages being used. It would be ideal to look for logs in one location, not several ("if you're on rhel/debian and installed via the deb/rpm look in `/var/log/grype` otherwise check in `/other/dir`..."). If the user decides to override the log location, that's OK, but default behavior should be easy and consistent.

/var/log was an example to point how logging by default is useful. I agree that the default behavior should be easy and consistent. Using the current working directory or $HOME (or anything in between) would be fine.

2. On the theme of "do CLI tools write to disk": yes CLI tools often write to disk --there are also a class of CLI tools that do not. Just because tool X writes to disk doesn't automatically mean that we should too. I think we need to convince ourselves when it makes sense for the use case in question. The use case put forth in this issue is "should we always log by default to help users troubleshoot problems".

At this point in time, I think it would be troublesome not to write to disk. CI environments allow to collect artifacts and the grype log output as an artifact would be great to keep around.

3. If we log by default, we own handling those logs. If it's in a local directory relative to invocation (like with `tox`) this seems simple enough --note that we will require users to add our local `./.grype/` scratch dir to their gitignore. If we decide to use a common directory for all invocations, then we will have relatively more responsibility on us (log rotation / cleanup, concurrent writes, etc) --we should try to avoid this unless we need to, as these fall outside of the `curl`-and-run requirements for this project.

This point reads more like the many options we can come up with related to usage. We don't need to require users to add .grype to .gitignore if this is configurable for example. I would propose to go from simplest (e.g. Tox's approach to .tox) to complex: we own the rotation of logs.

I don't think we need to prevent logging by default to disk because we might want log rotation.

4. Observations on grype behavior today: when running in a CI environment you automatically are getting info/error level events to the console (since there is no tty).

This is great today. And will help us when things are running on unknown environments, mostly on CI.

Given the above points, I want to put forth some opinions on this:
* I think logs for troubleshooting is awesome and helpful. I don't think the logs are required to be persisted on disk per-say.

* When running in your terminal (with a tty) you have the power to change the output to a logging format with `-vv` pretty simply. This means that we're really discussing users that are running in CI, which already default to a log format to the console (no `-v` necessary).
With the above opinions, I feel that the discussion should really steer closer to "what should the default log level be" since a user running this without a tty already has logs directed to the console.

This is part of the problem with what we have today. I would strongly suggest that for now, it should be DEBUG levels of logging, since the tooling is brand-new, and we don't know enough of all the various environments this is going to be running on. But since the output is currently mixed with the user-facing messaging, it is way too verbose, causing noise. That amount of noise in a CI run wouldn't be as useful as having it separate on a file which can be looked at when there is a failure.

This could be avoided today with something like grype myimage:latest > grype.log but then that forces anyone that doesn't redirect to handle the high-verbosity output.

wagoodman commented 4 years ago

since the output is currently mixed with the user-facing messaging

Just to clarify, when running in an environment without a TTY, only logging is shown (not the ETUI). There isn't a "mix" of user-facing messaging aside from the presenter report output. Specifically, all log output is to stderr and presenter report output is to stdout. So this command would give you the desired result:

grype myimage:latest 2> grype.log

Where the log file would contain only log entries and stdout would show the report.

That amount of noise in a CI run wouldn't be as useful as having it separate on a file which can be looked at when there is a failure....

Today when there is no TTY we default to INFO level for the logger output, which already shows failures.

...then that forces anyone that doesn't redirect to handle the high-verbosity output.

Are we saying that the volume of output from DEBUG log level by default is too much for CI output?

wagoodman commented 3 months ago

I'm going to close this for now with some context:

there is a way to log to disk in syft in an opt in capacity (SYFT_LOG_FILE=/tmp/syft.log syft alpine:latest -vv... and similar options for grype)
at this point we won't be by-default writing logs to disk