conda-incubator / conda-store

Data science environments, for collaboration. ✨
https://conda.store
BSD 3-Clause "New" or "Revised" License
137 stars 44 forks source link

[ENH] - Improve build logs for users #806

Closed kcpevey closed 2 months ago

kcpevey commented 3 months ago

Feature description

As an end user, if I have a build that fails, I expect to use the logs (via the UI, not Loki) to help me debug. I find that the logs available to me are very frequently insufficient to help me debug.

I will also say the conda output that I expect as an end user IS available, but only after a successful build and it looks a little different given that we're building from a conda-lock file. At the very least, I think if an error happens during the build, there should be a stdout/stderr flush.

Value and/or benefit

End users who can debug failed builds and have visibility into the progress of a build.

Anything else?

Example failure:

starting build of conda environment 2024-03-21 14:52:09.332184 UTC
::group::action_solve_lockfile
Running command: mamba info

     active environment : None
       user config file : /root/.condarc
 populated config files : 
          conda version : 23.3.1
    conda-build version : not installed
         python version : 3.10.12.final.0
       virtual packages : __archspec=1=x86_64
                          __glibc=2.31=0
                          __linux=5.10.210=0
                          __unix=0=0
       base environment : /opt/conda  (writable)
      conda av data dir : /opt/conda/etc/conda
  conda av metadata url : None
           channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /opt/conda/pkgs
                          /root/.conda/pkgs
       envs directories : /opt/conda/envs
                          /root/.conda/envs
               platform : linux-64
             user-agent : conda/23.3.1 requests/2.31.0 CPython/3.10.12 Linux/5.10.210-201.852.amzn2.x86_64 ubuntu/20.04.6 glibc/2.31
                UID:GID : 0:0
             netrc file : None
           offline mode : False

Running command: conda config --show
add_anaconda_token: True
add_pip_as_python_dependency: True
aggressive_update_packages:
  - ca-certificates
  - certifi
  - openssl
allow_conda_downgrades: False
allow_cycles: True
allow_non_channel_urls: False
allow_softlinks: False
allowlist_channels: []
always_copy: False
always_softlink: False
always_yes: None
anaconda_upload: None
auto_activate_base: True
auto_stack: 0
auto_update_conda: True
bld_path: 
changeps1: True
channel_alias: https://conda.anaconda.org
channel_priority: flexible
channel_settings: []
channels:
  - defaults
client_ssl_cert: None
client_ssl_cert_key: None
clobber: False
conda_build: {}
create_default_packages: []
croot: /opt/conda/conda-bld
custom_channels:
  pkgs/main: https://repo.anaconda.com
  pkgs/r: https://repo.anaconda.com
  pkgs/pro: https://repo.anaconda.com
custom_multichannels:
  defaults: 
    - https://repo.anaconda.com/pkgs/main
    - https://repo.anaconda.com/pkgs/r
  local: 
debug: False
default_channels:
  - https://repo.anaconda.com/pkgs/main
  - https://repo.anaconda.com/pkgs/r
default_python: 3.10
default_threads: None
deps_modifier: not_set
dev: False
disallowed_packages: []
download_only: False
dry_run: False
enable_private_envs: False
env_prompt: ({default_env}) 
envs_dirs:
  - /opt/conda/envs
  - /root/.conda/envs
error_upload_url: https://conda.io/conda-post/unexpected-error
execute_threads: 1
experimental: []
extra_safety_checks: False
fetch_threads: 5
force: False
force_32bit: False
force_reinstall: False
force_remove: False
ignore_pinned: False
json: False
local_repodata_ttl: 1
migrated_channel_aliases: []
migrated_custom_channels: {}
non_admin_enabled: True
notify_outdated_conda: True
number_channel_notices: 5
offline: False
override_channels_enabled: True
path_conflict: clobber
pinned_packages: []
pip_interop_enabled: False
pkgs_dirs:
  - /opt/conda/pkgs
  - /root/.conda/pkgs
proxy_servers: {}
quiet: False
remote_backoff_factor: 1
remote_connect_timeout_secs: 9.15
remote_max_retries: 3
remote_read_timeout_secs: 60.0
repodata_fns:
  - current_repodata.json
  - repodata.json
repodata_threads: None
report_errors: None
restore_free_channel: False
rollback_enabled: True
root_prefix: /opt/conda
safety_checks: warn
sat_solver: pycosat
separate_format_cache: False
shortcuts: True
show_channel_urls: None
signing_metadata_url_base: None
solver: classic
solver_ignore_timestamps: False
ssl_verify: True
subdir: linux-64
subdirs:
  - linux-64
  - noarch
target_prefix_override: 
track_features: []
unsatisfiable_hints: True
unsatisfiable_hints_check_depth: 2
update_modifier: update_specs
use_index_cache: False
use_local: False
use_only_tar_bz2: False
verbosity: 0
verify_threads: 1

Running command: conda config --show-sources

CONDA_FLAGS=--strict-channel-priority
Locking dependencies for ['linux-64']...
 - Install lock using: conda-lock install --name YOURENV /tmp/tmpo_lt8igh/conda-lock.yaml
Rendering lockfile(s) for linux-64...
WARNING: installation of pip dependencies from explicit lockfiles is only supported by the 'conda-lock install' and 'micromamba install' commands. Other tools may silently ignore them. For portability, we recommend using the newer unified lockfile format (i.e. removing the --kind=explicit argument.
 - Install lock using : conda create --name YOURENV --file conda-linux-64.lock

::endgroup::
Traceback (most recent call last):
  File "/opt/conda/envs/conda-store-server/lib/python3.10/site-packages/conda_package_handling/streaming.py", line 32, in _extract
    with open(filename, "rb") as fileobj:
FileNotFoundError: [Errno 2] No such file or directory: '/opt/conda/pkgs/libgrpc-1.61.2-h42401df_0.conda'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/envs/conda-store-server/lib/python3.10/site-packages/conda_store_server/build.py", line 183, in build_conda_environment
    context = action.action_fetch_and_extract_conda_packages(
  File "/opt/conda/envs/conda-store-server/lib/python3.10/site-packages/conda_store_server/action/base.py", line 31, in wrapper
    action_context.result = f(action_context, *args, **kwargs)
  File "/opt/conda/envs/conda-store-server/lib/python3.10/site-packages/conda_store_server/action/download_packages.py", line 41, in action_fetch_and_extract_conda_packages
    conda_package_handling.api.extract(str(file_path))
  File "/opt/conda/envs/conda-store-server/lib/python3.10/site-packages/conda_package_handling/api.py", line 77, in extract
    format.extract(fn, dest_dir, components=components)
  File "/opt/conda/envs/conda-store-server/lib/python3.10/site-packages/conda_package_handling/conda_fmt.py", line 46, in extract
    _extract(str(fn), str(dest_dir), components=components)
  File "/opt/conda/envs/conda-store-server/lib/python3.10/site-packages/conda_package_handling/streaming.py", line 42, in _extract
    raise exceptions.InvalidArchiveError(filename, f"failed with error: {str(e)}") from e
conda_package_handling.exceptions.InvalidArchiveError: Error with archive /opt/conda/pkgs/libgrpc-1.61.2-h42401df_0.conda.  You probably need to delete and re-download or re-create this file.  Message was:

failed with error: [Errno 2] No such file or directory: '/opt/conda/pkgs/libgrpc-1.61.2-h42401df_0.conda'
jaimergp commented 3 months ago

conda-lock invokes conda with --json, which hides all normal output.

Unfortunately I don't think conda knows how to handle both -vvv (or the equivalent env var CONDA_VERBOSITY=3) and --json at the same time; -vvv overrides --json, which would break the JSON expectation in conda-lock. But verbose logging should go to stderr, so this sounds like a bug in conda

I think the only way would be to contribute something in conda and then ask for a revendor in conda-lock.

nkaretnikov commented 3 months ago

I think this should be moved to the conda-store issues repo because all of these things are about the server. The UI only links to logs generated by the server ATM.

Action items:

Long reply to @kcpevey's questions (feel free to skip if you don't need context):

> It seems like the builds take a long time to populate this logs with anything beyond starting build of conda environment 2024-03-21 14:52:09.332184 UTC. Perhaps they are not flushed frequently enough? Yes, these are written only after an action succeeds. It should be possible to write iteratively to the output stream. > The conda config is confusing. The output from the conda-config command shows channel_priority: flexible which shouldn't be true for conda-store (we have checked this many times). conda-store uses many tools internally and we print logs from them without any processing. The actual channel priority is set via `--strict-channel-priority` and is printed later. See the previous discussion here https://github.com/conda-incubator/conda-store/issues/737#issuecomment-1913780222 and the PR that implements it here: https://github.com/conda-incubator/conda-store/pull/693 > As an end user, it would be nice to know the build sequence so that I can understand where I am in the process and what part I might be hung on. Currently, its hard to tell what is happening on the backend. From the logs, I have no idea what the process is (conda-lock/conda solve/build env). Example - I had a succesful build from a month ago. It took 3 minutes to build originally. I copied it to a new environment and it took 47 minutes. The only thing I saw for that time was starting build of conda environment 2024-03-21 14:52:09.332184 UTC. These strings are part of actions that are printed in https://github.com/conda-incubator/conda-store/blob/b996728154ce5f3869c6c50d1bf6ddd16152988a/conda-store-server/conda_store_server/build.py#L146, see `build_conda_environment`. That the main build pipeline. There are also actions concerning creation of build artifacts, see the tasks here https://github.com/conda-incubator/conda-store/blob/b996728154ce5f3869c6c50d1bf6ddd16152988a/conda-store-server/conda_store_server/app.py#L677 in `create_build`. > I will also say the conda output that I expect as an end user IS available, but only after a successful build and it looks a little different given that we're building from a conda-lock file. At the very least, I think if an error happens during the build, there should be a stdout/stderr flush. Yes, I agree. Most of this should be solved by flushing stdout/stderr. > Example failure: [...] What version of conda-store are you running? Either it's a network failure or you're running an old version of conda-store that doesn't have fixes for package cache corruption, see this PR https://github.com/conda-incubator/conda-store/pull/745
peytondmurray commented 3 months ago

I'm in agreement with @nkaretnikov that at the very least there are issues here which need to first be addressed on the conda-store-server side of things. IMO improving the logging situation there is the first step, then we can think about how to implement user-facing feedback in the UI next. I think for this reason it makes sense to first move this to https://github.com/conda-incubator/conda-store. With respect to the individual issues:

The conda config is confusing. The output from the conda-config command shows channel_priority: flexible which shouldn't be true for conda-store (we have checked this many times).

conda-store uses many tools internally and we print logs from them without any processing. The actual channel priority is set via --strict-channel-priority and is printed later. See the previous discussion here https://github.com/conda-incubator/conda-store/issues/737#issuecomment-1913780222 and the PR that implements it here: https://github.com/conda-incubator/conda-store/pull/693

@kcpevey Looks like conda config is not the right source of information about your config when you use conda-store. I can see a few of ways of addressing this, but do you have any ideas about your preferred solution here? I'd be interested to hear them if so. @jaimergp @nkaretnikov The fact that this has come up before means we need to make it clearer to the user.

Can we insert a shim for conda config and provide the actual configuration used by conda-store? Or maybe just display a message saying "You've got conda-store installed. FYI conda-store has a couple of settings which might differ from your conda config..." or similar.

As an end user, it would be nice to know the build sequence so that I can understand where I am in the process and what part I might be hung on. Currently, its hard to tell what is happening on the backend. From the logs, I have no idea what the process is

It sounds like it might be possible to get more information about what conda-lock is doing by following @nkaretnikov's suggestion. Let's try that first since we have the most control over that. I could also see adding something like a progress bar or some other visualization (diagram of the build steps and their current status?) but this would require upstream changes in conda and conda-lock.

TLDR: there are a few things we can start to take action on.

  1. Try to stream output from conda-lock while actions are running. starting build of conda environment 2024-03-21 14:52:09.332184 UTC isn't really enough to give the user an idea about the current status of a build. It sounds like conda output is currently only available on successful builds, but let's make it accessible for failed builds as well; that way users can at least start to track down build issues.
  2. Improve clarity of the conda config output. Need ideas about how to address this.
  3. I'm also wondering why a build which took 3m initially took 47m on a subsequent build. Is this some network issue, or something else? If it's anything other than a network issue I'd be interested in getting that fixed - 47m is way too long to wait for an environment to build. @kcpevey would you be able to share the environment you copied over?

If we can't make progress with (1) above, we'll probably need to make upstream changes to conda and conda-lock. Later on we can follow up about changes we can make to conda-store-ui.

kcpevey commented 3 months ago

I'm also wondering why a build which took 3m initially took 47m on a subsequent build. Is this some network issue, or something else? If it's anything other than a network issue I'd be interested in getting that fixed - 47m is way too long to wait for an environment to build. @kcpevey would you be able to share the environment you copied over?

I actually think this one has to do with a bug that had been recently introduced where "queued" builds were showing up as "building". So this can be disregarded.

As for reasonable output given what I know now, I propose the logs simply output 2 additional pieces of information:

  1. The name of the step in the build process (e.g. "Solving conda-lock", "Building env from lockfile", "Conda export", etc)
  2. The command that's being run (e.g. conda-lock create ...)
nkaretnikov commented 2 months ago

Peyton, Chuck, and I discussed this today. It was agreed that I'll focus on log-flushing since I have the most context.

We'll split this into self-contained issues and close this one to avoid getting sidetracked: