Open gzt5142 opened 5 months ago
@gzt5142 Thank you for opening this issue, and welcome to the repository. :)
I'm not able to reproduce this issue on a regular Nebari deployment. Could you please share some more details about the whitelist of connections and a YAML spec that fails for you, so that we can try reproducing and isolating the issue?
... Could you please share some more details about the whitelist of connections and a YAML spec that fails for you, so that we can try reproducing and isolating the issue?
I have been able to reproduce this problem with a minimal YAML. Start with just xarray
, for example.
channels:
- conda-forge
dependencies:
- python=3.10.*
- xarray
Then modify this environment to add any new dependency....rebuild the environment and I get the same failure after network timeout.
channels:
- conda-forge
dependencies:
- python=3.10.*
- xarray
- cf_xarray
Furthermore.... if I build an environment (call it "Test" in the "global" namespace) with that first YAML containing only xarray
... then delete that environment using conda-store UI, I now can't build a new environment (regardless of YAML content) with the same name ("Test" in "global"). Even if I use the same initial YAML with just xarray
.
As to network whitelist, our security people don't really like it when we disclose configuration... so I'm reluctant to give details. We have whitelisted one or two conda channels (including conda-forge). Most anything else will be blocked.
But the whitelist shouldn't matter in this case -- I can build an environment from new without any issue; connection to the conda channel works perfectly. If I try to make any **updates*** to an existing environment, that's when I get this error. Or if I try to re-create an environment with the same name.
So the question I'm stuck on is, what additional network requests are made for a re-build that are not made for an initial build.
--gt
@pavithraes
Could this be related to conda-lock
???
I notice that conda-lock
makes a sneaky network connection to
https://raw.githubusercontent.com/regro/cf-graph-countyfair/master/mappings/pypi/grayskull_pypi_mapping.yaml
for a lookup function. If this is not done for initial builds (but is done for rebuilds), then that may cause this timeout -- assuming that raw.githubusercontent.com
is not on the whitelist for network connections.
Would conda-lock
be invoked for a re-creation of an environment with an old name?
--gt
Thanks for the detailed report! I'll take a look at what requests are made.
@gzt5142
Do you have access to the terminal on that machine? Could you try building and rebuilding the same environment with pure conda, and a slightly edited environment as well?
In conda-store, I don't think there's any difference server-side in terms of whether you're building from scratch or after editing.
I wonder if it's due to caching that conda does. Because you have a proxy configured on your system, as mentioned here: #767, so maybe these urls get stored in the package cache and conda cannot resolve them.
https://redacted@nexus.internal.host/repository/conda-forge/linux-64/python_abi-3.11-4_cp311.conda
Because you have a proxy configured on your system, as mentioned here: #767, so maybe these urls get stored in the package cache and conda cannot resolve them.
I think there may be some misunderstanding here. We tried the proxy, and abandoned it in favor of whitelisting a conda channel (as described above) and allowing direct connections. No proxy is involved in this issue.
Describe the bug
Attempting to change the composition of an existing conda environment built with conda-store as part of a
nebari
deployment.Expected behavior
environment should build with updated YAML and produce the
lock
andlog
artifacts, just as if it were a new environment.How to Reproduce the problem?
global
namespace with namepangeo
(this succeeds).pangeo
environment to add a new package dependency in the YAML descriptionbuilding
mode for about 15 minutespangeo
environment.global
with namepangeo
(same as above)building
mode for about 15 minutesOutput
The conda-store log includes the stack trace for the failure:
traceback.txt
Versions and dependencies used.
conda-store
installed as part of Nebari 2023.11.1quansight/conda-store-server:2023.10.1
Anything else?
This installation is on a sequestered network with strict network policy -- most traffic is blocked except specific whitelisted connections. We have opened ports/hosts/etc to allow conda builds from specific channels (these connections and builds work well for new environments), but fails while attempting to edit existing environments.