Open mattkram opened 2 years ago
Nice! I've been thinking about adding some optional arguments to init
. Right now it would create an empty dependencies
list. We could do
conda project init [-c CHANNEL] [package_spec [package_spec ...]]
so it feels a little more like conda create
I really like that idea. Every new project now I do conda env create -p ./env -f environment.yml
. Actually can't remember if I'm using conda create
or conda env create
. I always mess it up.
So I think it'd be awesome if init
basically did that, and you could pass in a path for your environment and also load a package spec quickly from an existing file.
Another feature would be an --interactive
flag, which would give a list of questions to fill in the parts of anaconda-project.yml
. Can't remember if that exists already.
Another stretch goal would be to move config to pyproject.toml
and somehow register conda
as a build backend, but that's a lot harder to do I think. The benefit is the project spec would be similar to those by other tools like poetry
and flit
.
Why not make conda project init
env more like conda env create
by renaming it to conda project create
?
Also, I do think the sub-command proposal above (Alternative 2) would indeed help tidy things up!
With this proposal, I count 18 'top level' conda-project
commands (11 without subcommands, 7 with).
Here are the ones currently proposed without subcommands, though I have some suggestions for more subcommands which would bring it down to 16 commands total. I have also attempted to order these by how often I use them personally:
init
(rename to create
for consistency with conda env create
: it would be conda project create
?) run
prepare
(rename to install
?), lock
unlock
(how about a lock remove
subcommand?)archive
unarchive
(how about an archive extract
subcommand?)upload
download
(any way to combine with upload
?) update
clean
(I've never used this!)Proposed subcommands (also roughly ordered by my frequency of usage):
command X
data X
env X
variable X
package X
platform X
service X
(I've never used this either!)Of these, I could do away with upload
and download
(unless we integrate with Nucleus?) and clean
. I also have never seen services used either...
Lastly, I never use add
commands as I don't like commands that mutate the project.yml
(though I know some people do use them, e.g. @AlbertDeFusco when giving tutorials).
Interesting perspective re: add
. I have precisely the opposite perspective, where I avoid changing the project.yml
manually and try to rely on CLI commands (when I use poetry
).
I think clean
is actually nice. One thing that is missing IMO from poetry
is no easy way to tear down the created environment (especially since that venv gets stored centrally).
upload
/download
feel like Nucleus/AE5 features. Unsure whether those would better be included in a separate CLI for Nucleus (which we've talked about before).
Clean also removes downloaded data (defined in the downloads:
key) and any locally set variables (anaconda-project-local.yml
)
The service feature was never fully developed and maybe it doesn't really need to be there. For example the conda package redis
no longer provides the redis server, but only the python bindings.
https://anaconda-project.readthedocs.io/en/latest/user-guide/reference.html#services
And don't forget dockerize
I concur, I think there's value in following conventions and usage from npm
& poetry
to lower the conversion barrier from those tools, so I vote to keep the install
and add
commands. That said I also agree with being consistent with other conda
commands, so aliasing create
to install
would also be valuable. Docs/tutorials teaching npm/poetry -> conda project
could use the install
alias and docs/tutorials teaching conda env -> conda project
could use the create
alias.
Given the plugin efforts in conda
it wouldn't be crazy to move upload
/download
(and other "adjacent" features) into separate plugins, help keep each plugin "pure".
Originally I wanted to vote for the proposed alt 2 CLI (bc it followed the subcommand logic used by git
) but tucking the add
, remove
, and list
subcommands that far down seems highly undesirable. I'd expect to be able to run conda project add PACKAGE
so my vote is for alt 1 CLI.
I would also like to see us including a cpm
alias for conda project
, even less typing!
+1 for cpm
! I was already trying to think of how I'd add to my .bashrc
, but cp
is obviously ripe w/ problems :P
What's the "m" stand for?
I don't use npm
or poetry
, and am instead coming from a background of expecting to edit an environment.yml or requirements.txt file in a text editor. I'm quite distressed when I run some anaconda-project commands and find that the .yml file has been modified for no apparent reason, e.g. due to a recent bug that is somehow duplicating the contents of the user-fields
section every time I run some command that has nothing to do with it. (Not sure what command that is, but it's definitely not one that I'd expect to modify the file, and the fact that it modifies it at all, even apart from the buggy way it modifies it, is deeply distressing to me.)
So it sounds like I want the opposite of what some people here want -- I'd either want no circumstance where [ana]conda-project modifies the file, or for all such commands to be organized under an edit
submenu or a fully separate command that I could then safely avoid. I know I'm more text-file-centric than most people, so maybe I'm an outlier, but to me such modification violates a contract with the user that the user is in charge of the text file, not the program; it's up to me to write it and the program to respect it! Once the program itself is modifying it in any way, I'm immediately suspicious and ill at ease -- what happens to my edits? Will comments be dropped? Will the file be reformatted? If I make a syntax error, will it silently delete all my work? How can I be sure? I deeply do not like the feeling that I'm not fully in charge of this config file. Again, maybe I'm an outlier. :-/
I think it's node package manager
.
conda project manager
?
"Project management" is an unfortunate pun, then, since this software is about projects but not about project management as the term is usually used! :-)
@jbednar that is a fair perspective. I wonder if we could do something like a --track
flag.
For me, I like being able to do conda project add numpy
so I don't need to remember how to format the file. I also generally don't worry about modifying the file, because I git
everything, but it is definitely a fair perspective that maybe it should be an opt-in behavior.
When I start a new project using poetry
, this is my workflow, which I repeat all the time:
poetry init
poetry add numpy pandas matplotlib ...
git add pyproject.toml poetry.lock
git commit -m "Initialize project"
That was my workflow every time I set up a new analysis in the past when I wasn't using conda
. My hope here is that I can have something that simple, but with all of the benefits of the conda
ecosystem.
P.S. in that case, poetry
runs a lock
step any time the dependencies change.
Personally (as I've said already), I don't like the commands that edit the yaml (and sometimes the yaml gets completely reformatted, e.g when platforms are added after locking!) but I do know they have an audience.
One proposal I remembered is an anaconda project edit
command which could be a single entry point that offers all the existing editing functionality as suggested here https://github.com/Anaconda-Platform/anaconda-project/issues/319.
I think this could have a bunch of advantages:
edit
command.project.yml
, hopefully in a way that makes everyone happy.Right, @mattkram, how you're using poetry represents the opposite user contract: The program provides ways to handle the typical modifications you'd want, and then you could edit the file, but are always suspicious that if you do so, you'll mess up the automatic editing somehow. That's an example of the program owning the config file, not the user, and the fundamental interface being a command-line rather than a text-file interface. The current anaconda-project is in a muddy middle ground, where many of the things you'd want to express aren't expressible with commands, and so my reaction to that is to consider the commands a distraction and not useful. But the opposite is also defensible, if the command interface is sufficiently expressive. I'm dubious about whether it could be made sufficiently expressive to cover how commands work in projects, though.
I'll also add one more comment about alternative 1 versus alternative 2 described at the top of this issue: I think the most important thing is consistency within the conda/anaconda ecosystem. There may be counterexamples but commands such as conda env list
and conda env remove
suggest alternative 2 is already in use in conda and I also note that ae5-tools also uses this style (e.g. ae5 project list
or ae5 deployment stop
).
I was thinking conda package manager
but conda project manager
works too 😄
I personally feel pretty strongly about following in the footsteps of existing tools. Not necessarily because they do it best but rather to avoid having to learn yet another set of commands that effectively do the same thing just with a different backend.
If we rely on ruamel.yaml
(conda
does) we can roundtrip comments and key ordering. But I hear you, I think we can satisfy both desires here with relative ease. If we follow poetry
's convention then only the add
and remove
subcommands will modify project.yml
. Users who prefer to manually maintain their project.yml
would almost exclusively use the install
/uninstall
subcommands after they modify their config.
I do agree that commands that modify the config should be very clear in that they do so and there should be a strict limited number of commands that make config modifications. But programmatically modifying the config should still be a first-class citizen. Config files are easy for us devs to look at and update but I'd strongly resist claiming that for all of our users.
As a side note I've also found that any tutorial/docs that can specify running a command modifying the config to be (1) more concise and (2) easier to follow.
Editing .yml files is a convention that follows existing tools, just different tools (notably conda). :-)
I think anaconda-project does preserve comments, ordering etc., in the cases I've seen; the issue is just that a user is never quite sure whether that will really be true or if there are limits to it. In any case, yes, if modification were limited to add
and remove
and we could very clearly state that fact (along with a statement that the file won't silently be reformatted), then I'm happy!
Overall, yes, executing a simple command is more predictable and conveyable while being less expressive, thus more suited to a tutorial, while editing a file is more expressive and less conveyable and predictable, and thus more suited to advanced or daily usage, so they both have their uses if they can be kept distinct.
One more point, which I think is one that fundamentally drives my support for an "automatic dependency tracking" feature.
Nearly every time I am in a Jupyter notebook, I realize I forgot to add a package. I can do !conda install numpy
(whichever the correct way is), but I very much am in search of a way to do that where my intention for that abstract dependency to be added to environment.yml
or project.yml
is done for me. As-is, it is quite possible for my actual conda environment and my environment.yml
file to be out-of-sync, which reduces the likelihood of reproducibility (unless I'm very careful).
And why I really like anaconda-project
is that there is a distinct difference between abstract and concrete dependencies, which I haven't found to be possible with conda env export
.
Are you arguing that add
and remove
ought to be features at the conda level, independently of project support? If so that is a very different question, i.e. whether to provide a CLI add/remove interface to conda that fundamentally alters a config file (whether that's environment.yml or project.yml) and only then an existing environment, so that the text file is the master reference, not the installed environment? That's a valid request but different from my conception of conda project
, where the core features are having a record of the commands to run in that environment plus locking to ensure that the environment+commands combo is fully reproducible. It sounds to me like you're using features of anaconda-project to compensate for features missing from conda that aren't necessarily to do with projects, which to me requires a command to be specified.
So if I'm understanding you correctly, there are actually then three (or more?) independent and orthogonal requests here for functionality already in anaconda-project that people want to have added to conda:
conda project run
and fully reproduce whatever it is this project does. (To me, that's the fundamental affordance of anaconda-project.)I'm hearing @kenodegard and @mattkram focusing on 3 while @jlstevens and I focus on 1 and 2, while conda-lock focuses on 2 (but without the notion of a command that runs in the environment). Personally, I think 3. is a valid request and a useful concept but not one related to projects, because to me a project has a specific stored and curated command that is meant to be run in the specified environment. So I'd argue that all three can be implemented separately in some sense, with CLI add/remove supporting either project or environment files, locking available identically for both bare environments and projects, and projects simply adding the notion that a command is being captured.
If people agree with all the above, I'm still slightly unclear on how to achieve 3 without the notion of capturing a command, because to me the fact that an environment is tied to a specific command is what lets it be separate and not get polluted by my daily need to install stuff I need in my working session. I know how to curate environment+command as a unit, because to me that has a very clear remit: the envt contains only what that command needs, and no more or less. Each project is a separate environment from my daily working environment, which gets filled up with all sorts of non-reproducible who knows what as I do my work, and keeping projects fully separate like that is what makes anything reproducible for me. So, I'm not sure how to achieve 3 (curating a separate environment declaratively) without tying the environment to a specific command, but maybe I'm just showing my ignorance of poetry and other approaches. Probably something worth discussing orally instead of typing more here, though! Maybe it's a usage of conda run
that I'm not imagining...
Thank you for the write-up @jbednar, I think it is pretty close to my perspective as well. However, I tend to believe 2 & 3 should be integrated into a single solution, and then 1 is a follow-on from that.
For me, I create a new isolated environment for every project (analysis, repo, application, library, etc.). I prefer to keep them in a local path rather than global, and I barely ever use base
. So for me, items 2 & 3 above are of highest priority, and I think intertwined. What tools like npm
, poetry
, and cargo
do is allow specification of abstract dependencies in one file and a locking mechanism to output the exact version of every package that is installed into the environment. They also have a CLI for doing that. Arguably, the most ideal solution would be to have that as a feature of conda env
. I started looking into existing work on this, and that is how I found anaconda-project
and is what sparked my initial interest.
Just to clarify my thinking, my understanding of abstract and concrete dependencies is this:
With the CLI, I can update/edit the abstract dependencies file, and automatically generate the locked dependency file and keep them in sync. I can use a pre-commit hook to ensure I never commit an unlocked or mismatched pair of files, such that I can completely tear down and reproduce the environment from every platform it is said to be supported on.
anaconda-project
currently does that (or most of it). I also think there is tremendous value in conda project run
being able to serve as a means for specifying different commands that one may want to save. In some sense, this may be an all-in-one replacement for what make
is often used for (make test
, make build
, etc.).
Whether these should be two separate efforts is an interesting question. I wouldn't be opposed to that, but I would say they should be designed to work together. My initial sense was that changing conda env
will be harder and it makes sense to do it altogether, but I'm not wedded to that idea.
Wherever we end up, I think there are some amazing things related to environment management in anaconda-project
right now, which other tools don't have (i.e. the ability to specify different dependencies for dev/prod/testing/etc.).
Ok, sounds like we're on the same page, i.e. about having separate but mostly overlapping goals!
I agree with @mattkram and @jbednar
I see conda project
as a viable alternative approach to environment management (conda env
). I also don't see a huge difference between projects and packages, projects just feel like general purpose packages.
I absolutely love the commands feature. This is the part that makes a project a general purpose package to me. npm
has this, I don't think poetry
has this (?). I see it as an abstraction of the build/test commands in conda
recipes. I think there's great value in being able to define any number of commands for a project/package (e.g. start-server
, test
, lint
, build
, etc.).
I see how lock files and environment management via add
/remove
feels separate but if neither are part of conda project
then I feel this plugin is poorly named and should rather be renamed to conda command
(we could then add a separate plugin exposing the add
/remove
environment management logic).
Another thing to keep in mind, the project.yml
file contains much of the same kinds of information present in meta.yml
which will be annoying to maintain if a project is also a package.
I see conda project as a viable alternative approach to environment management (conda env). I also don't see a huge difference between projects and packages, projects just feel like general purpose packages.
True; both package recipes and project .ymls are specs for building artifacts, with projects being more general in that they encompass activities (like launching a live notebook) and not simply static outputs. But locked projects are also less general in a different sense, in that (unlike packages) they cannot be combined together.
I absolutely love the commands feature. This is the part that makes a project a general purpose package to me. npm has this, I don't think poetry has this (?). I see it as an abstraction of the build/test commands in conda recipes. I think there's great value in being able to define any number of commands for a project/package (e.g. start-server, test, lint, build, etc.).
Yep; having multiple commands that all share the same envt is crucial; we use that for all of the things you mention.
how lock files and environment management via add/remove feels separate but if neither are part of conda project then I feel this plugin is poorly named and should rather be renamed to conda command (we could then add a separate plugin exposing the add/remove environment management logic).
I think the independence only goes one way, i.e. that locking is useful for conda in general, while capturing commands without locking is only of limited use since it will soon break as dependencies change. So I would only argue that locking is valid on its own. add/remove, that's a separate story. :-)
Another thing to keep in mind, the project.yml file contains much of the same kinds of information present in meta.yml which will be annoying to maintain if a project is also a package.
I'm not sure how often that will come up, and getting a shared file format between conda and projects has proven to be a years-long endeavor, but sure; aim to unify all the ymls! :-)
In preparation of upcoming hackdays, I'm collecting some ideas around how the CLI could be refined when converting
anaconda-project
to a potentialconda project
extension.This table summarizes potential renaming's of sub-commands, which would all be prefixed by
conda project
or `conda-project.After typing this, my personal preference is Alternative 2, where essentially many of the hyphenated subcommands are further broken down into higher level sub-commands such as
service
with operations likeadd
,remove
,list
, etc.If would also be great if somehow the
conda install PACKAGE
command could somehow be intercepted from within an activated environment to add toanaconda-project.yaml
using the current behavior ofanaconda-project add-packages
.