Ideas for updated CLI in `conda-project`

mattkram commented 2 years ago

In preparation of upcoming hackdays, I'm collecting some ideas around how the CLI could be refined when converting anaconda-project to a potential conda project extension.

This table summarizes potential renaming's of sub-commands, which would all be prefixed by conda project or `conda-project.

After typing this, my personal preference is Alternative 2, where essentially many of the hyphenated subcommands are further broken down into higher level sub-commands such as service with operations like add, remove, list, etc.

If would also be great if somehow the conda install PACKAGE command could somehow be intercepted from within an activated environment to add to anaconda-project.yaml using the current behavior of anaconda-project add-packages.

Existing	New Alt 1	New Alt 2	Description
init			Initialize a directory with default project configuration
run			Run the project, setting up requirements first
prepare	install	install	Set up the project requirements, but does not run the project
clean			Removes generated state (stops services, deletes environment files, etc)
archive			Create a .zip, .tar.gz, or .tar.bz2 archive with project files in it
unarchive			Unpack a .zip, .tar.gz, or .tar.bz2 archive with project files in it
upload			Upload the project to Anaconda Cloud
download			Download the project from Anaconda Cloud
add-variable	add --variable	variable add	Add a required environment variable to the project
remove-variable	remove --variable	variable remove	Remove an environment variable from the project
list-variables	list --variable	variable list	List all variables on the project
set-variable	set --variable	variable set	Set an environment variable value in anaconda-project-local.yml
unset-variable	unset --variable	variable unset	Unset an environment variable value from anaconda-project-local.yml
add-download	add --data	data add	Add a URL to be downloaded before running commands
remove-download	remove --data	data remove	Remove a download from the project and from the filesystem
list-downloads	list --data	data list	List all downloads on the project
add-service	add --service	service add	Add a service to be available before running commands
remove-service	remove --service	service remove	Remove a service from the project
list-services	list --service	service list	List services present in the project
add-env-spec	?	env add	Add a new environment spec to the project
remove-env-spec	?	env remove	Remove an environment spec from the project
list-env-specs	?	env list	List all environment specs for the project
export-env-spec	?	env export	Save an environment spec as a conda environment file
lock			Lock all packages at their current versions
unlock			Remove locked package versions
update			Update all packages to their latest versions
add-packages	add	package add	Add packages to one or all project environments
remove-packages	remove	package remove	Remove packages from one or all project environments
list-packages	list	package list	List packages for an environment on the project
add-platforms	add --platform	platform add	Add platforms to one or all project environments
remove-platforms	remove --platform	platform remove	Remove platforms from one or all project environments
list-platforms	list --platform	platform list	List platforms for an environment on the project
add-command	add --command	command add	Add a new command to the project
remove-command	remove --command	command remove	Remove a command from the project
list-default-command	list --default-command	command list --default	List only the default command on the project
list-commands	list --command	command list	List the commands on the project

AlbertDeFusco commented 2 years ago

Nice! I've been thinking about adding some optional arguments to init. Right now it would create an empty dependencies list. We could do

conda project init [-c CHANNEL] [package_spec [package_spec ...]]

so it feels a little more like conda create

mattkram commented 2 years ago

I really like that idea. Every new project now I do conda env create -p ./env -f environment.yml. Actually can't remember if I'm using conda create or conda env create. I always mess it up.

So I think it'd be awesome if init basically did that, and you could pass in a path for your environment and also load a package spec quickly from an existing file.

Another feature would be an --interactive flag, which would give a list of questions to fill in the parts of anaconda-project.yml. Can't remember if that exists already.

mattkram commented 2 years ago

Another stretch goal would be to move config to pyproject.toml and somehow register conda as a build backend, but that's a lot harder to do I think. The benefit is the project spec would be similar to those by other tools like poetry and flit.

jlstevens commented 2 years ago

Why not make conda project init env more like conda env create by renaming it to conda project create?

jlstevens commented 2 years ago

Also, I do think the sub-command proposal above (Alternative 2) would indeed help tidy things up!

jlstevens commented 2 years ago

With this proposal, I count 18 'top level' conda-project commands (11 without subcommands, 7 with).

Here are the ones currently proposed without subcommands, though I have some suggestions for more subcommands which would bring it down to 16 commands total. I have also attempted to order these by how often I use them personally:

init (rename to create for consistency with conda env create: it would be conda project create?)
run
prepare (rename to install?),
lock
unlock (how about a lock remove subcommand?)
archive
unarchive (how about an archive extract subcommand?)
upload
download (any way to combine with upload?)
update
clean (I've never used this!)

Proposed subcommands (also roughly ordered by my frequency of usage):

command X
data X
env X
variable X
package X
platform X
service X (I've never used this either!)

Of these, I could do away with upload and download (unless we integrate with Nucleus?) and clean. I also have never seen services used either...

Lastly, I never use add commands as I don't like commands that mutate the project.yml (though I know some people do use them, e.g. @AlbertDeFusco when giving tutorials).

mattkram commented 2 years ago

Interesting perspective re: add. I have precisely the opposite perspective, where I avoid changing the project.yml manually and try to rely on CLI commands (when I use poetry).

I think clean is actually nice. One thing that is missing IMO from poetry is no easy way to tear down the created environment (especially since that venv gets stored centrally).

upload/download feel like Nucleus/AE5 features. Unsure whether those would better be included in a separate CLI for Nucleus (which we've talked about before).

AlbertDeFusco commented 2 years ago

Clean also removes downloaded data (defined in the downloads: key) and any locally set variables (anaconda-project-local.yml)

AlbertDeFusco commented 2 years ago

The service feature was never fully developed and maybe it doesn't really need to be there. For example the conda package redis no longer provides the redis server, but only the python bindings.

https://anaconda-project.readthedocs.io/en/latest/user-guide/reference.html#services

AlbertDeFusco commented 2 years ago

And don't forget dockerize

kenodegard commented 2 years ago

I concur, I think there's value in following conventions and usage from npm & poetry to lower the conversion barrier from those tools, so I vote to keep the install and add commands. That said I also agree with being consistent with other conda commands, so aliasing create to install would also be valuable. Docs/tutorials teaching npm/poetry -> conda project could use the install alias and docs/tutorials teaching conda env -> conda project could use the create alias.

Given the plugin efforts in conda it wouldn't be crazy to move upload/download (and other "adjacent" features) into separate plugins, help keep each plugin "pure".

Originally I wanted to vote for the proposed alt 2 CLI (bc it followed the subcommand logic used by git) but tucking the add, remove, and list subcommands that far down seems highly undesirable. I'd expect to be able to run conda project add PACKAGE so my vote is for alt 1 CLI.

I would also like to see us including a cpm alias for conda project, even less typing!

mattkram commented 2 years ago

+1 for cpm! I was already trying to think of how I'd add to my .bashrc, but cp is obviously ripe w/ problems :P

jbednar commented 2 years ago

What's the "m" stand for?

I don't use npm or poetry, and am instead coming from a background of expecting to edit an environment.yml or requirements.txt file in a text editor. I'm quite distressed when I run some anaconda-project commands and find that the .yml file has been modified for no apparent reason, e.g. due to a recent bug that is somehow duplicating the contents of the user-fields section every time I run some command that has nothing to do with it. (Not sure what command that is, but it's definitely not one that I'd expect to modify the file, and the fact that it modifies it at all, even apart from the buggy way it modifies it, is deeply distressing to me.)

So it sounds like I want the opposite of what some people here want -- I'd either want no circumstance where [ana]conda-project modifies the file, or for all such commands to be organized under an edit submenu or a fully separate command that I could then safely avoid. I know I'm more text-file-centric than most people, so maybe I'm an outlier, but to me such modification violates a contract with the user that the user is in charge of the text file, not the program; it's up to me to write it and the program to respect it! Once the program itself is modifying it in any way, I'm immediately suspicious and ill at ease -- what happens to my edits? Will comments be dropped? Will the file be reformatted? If I make a syntax error, will it silently delete all my work? How can I be sure? I deeply do not like the feeling that I'm not fully in charge of this config file. Again, maybe I'm an outlier. :-/

mattkram commented 2 years ago

I think it's node package manager.

conda project manager?

jbednar commented 2 years ago

"Project management" is an unfortunate pun, then, since this software is about projects but not about project management as the term is usually used! :-)

mattkram commented 2 years ago

@jbednar that is a fair perspective. I wonder if we could do something like a --track flag.

For me, I like being able to do conda project add numpy so I don't need to remember how to format the file. I also generally don't worry about modifying the file, because I git everything, but it is definitely a fair perspective that maybe it should be an opt-in behavior.

When I start a new project using poetry, this is my workflow, which I repeat all the time:

poetry init
poetry add numpy pandas matplotlib ...
git add pyproject.toml poetry.lock
git commit -m "Initialize project"

That was my workflow every time I set up a new analysis in the past when I wasn't using conda. My hope here is that I can have something that simple, but with all of the benefits of the conda ecosystem.

P.S. in that case, poetry runs a lock step any time the dependencies change.

jlstevens commented 2 years ago

Personally (as I've said already), I don't like the commands that edit the yaml (and sometimes the yaml gets completely reformatted, e.g when platforms are added after locking!) but I do know they have an audience.

One proposal I remembered is an anaconda project edit command which could be a single entry point that offers all the existing editing functionality as suggested here https://github.com/Anaconda-Platform/anaconda-project/issues/319.

I think this could have a bunch of advantages:

Cleans up the CLI API surface a whole lot.
It could be a really nice, interactive interface that is more discoverable and intuitive than the current one. Nowadays you can craft some really great interactive CLI experiences with Python.
It could have subcommands that are equivalent to the current commands, all under the edit command.
We could guarantee that it is the one command that can actually edit the project.yml, hopefully in a way that makes everyone happy.
Could be a fun, self-contained project for someone to work on!

jbednar commented 2 years ago

Right, @mattkram, how you're using poetry represents the opposite user contract: The program provides ways to handle the typical modifications you'd want, and then you could edit the file, but are always suspicious that if you do so, you'll mess up the automatic editing somehow. That's an example of the program owning the config file, not the user, and the fundamental interface being a command-line rather than a text-file interface. The current anaconda-project is in a muddy middle ground, where many of the things you'd want to express aren't expressible with commands, and so my reaction to that is to consider the commands a distraction and not useful. But the opposite is also defensible, if the command interface is sufficiently expressive. I'm dubious about whether it could be made sufficiently expressive to cover how commands work in projects, though.

jlstevens commented 2 years ago

I'll also add one more comment about alternative 1 versus alternative 2 described at the top of this issue: I think the most important thing is consistency within the conda/anaconda ecosystem. There may be counterexamples but commands such as conda env list and conda env remove suggest alternative 2 is already in use in conda and I also note that ae5-tools also uses this style (e.g. ae5 project list or ae5 deployment stop).

kenodegard commented 2 years ago

I was thinking conda package manager but conda project manager works too 😄

I personally feel pretty strongly about following in the footsteps of existing tools. Not necessarily because they do it best but rather to avoid having to learn yet another set of commands that effectively do the same thing just with a different backend.

If we rely on ruamel.yaml (conda does) we can roundtrip comments and key ordering. But I hear you, I think we can satisfy both desires here with relative ease. If we follow poetry's convention then only the add and remove subcommands will modify project.yml. Users who prefer to manually maintain their project.yml would almost exclusively use the install/uninstall subcommands after they modify their config.

I do agree that commands that modify the config should be very clear in that they do so and there should be a strict limited number of commands that make config modifications. But programmatically modifying the config should still be a first-class citizen. Config files are easy for us devs to look at and update but I'd strongly resist claiming that for all of our users.

As a side note I've also found that any tutorial/docs that can specify running a command modifying the config to be (1) more concise and (2) easier to follow.

jbednar commented 2 years ago

Editing .yml files is a convention that follows existing tools, just different tools (notably conda). :-)

I think anaconda-project does preserve comments, ordering etc., in the cases I've seen; the issue is just that a user is never quite sure whether that will really be true or if there are limits to it. In any case, yes, if modification were limited to add and remove and we could very clearly state that fact (along with a statement that the file won't silently be reformatted), then I'm happy!

Overall, yes, executing a simple command is more predictable and conveyable while being less expressive, thus more suited to a tutorial, while editing a file is more expressive and less conveyable and predictable, and thus more suited to advanced or daily usage, so they both have their uses if they can be kept distinct.

mattkram commented 2 years ago

One more point, which I think is one that fundamentally drives my support for an "automatic dependency tracking" feature.

Nearly every time I am in a Jupyter notebook, I realize I forgot to add a package. I can do !conda install numpy (whichever the correct way is), but I very much am in search of a way to do that where my intention for that abstract dependency to be added to environment.yml or project.yml is done for me. As-is, it is quite possible for my actual conda environment and my environment.yml file to be out-of-sync, which reduces the likelihood of reproducibility (unless I'm very careful).

And why I really like anaconda-project is that there is a distinct difference between abstract and concrete dependencies, which I haven't found to be possible with conda env export.

jbednar commented 2 years ago

Are you arguing that add and remove ought to be features at the conda level, independently of project support? If so that is a very different question, i.e. whether to provide a CLI add/remove interface to conda that fundamentally alters a config file (whether that's environment.yml or project.yml) and only then an existing environment, so that the text file is the master reference, not the installed environment? That's a valid request but different from my conception of conda project, where the core features are having a record of the commands to run in that environment plus locking to ensure that the environment+commands combo is fully reproducible. It sounds to me like you're using features of anaconda-project to compensate for features missing from conda that aren't necessarily to do with projects, which to me requires a command to be specified.

So if I'm understanding you correctly, there are actually then three (or more?) independent and orthogonal requests here for functionality already in anaconda-project that people want to have added to conda:

recording a command along with an environment so that a recipient of a .zip file for a project can do conda project run and fully reproduce whatever it is this project does. (To me, that's the fundamental affordance of anaconda-project.)
locking to ensure that this currently tested version of the project, where the given command has been verified to work with the given environment, is preserved and reproducible. (To me, this is important in practice, but not fundamental in theory.)
offering a CLI interface for adding and removing packages declaratively in a way that leaves an environment fully reproducible, unlike the usual conda install commands that end up with unreproducible environments. (To me, this is separate and mostly unrelated.)

I'm hearing @kenodegard and @mattkram focusing on 3 while @jlstevens and I focus on 1 and 2, while conda-lock focuses on 2 (but without the notion of a command that runs in the environment). Personally, I think 3. is a valid request and a useful concept but not one related to projects, because to me a project has a specific stored and curated command that is meant to be run in the specified environment. So I'd argue that all three can be implemented separately in some sense, with CLI add/remove supporting either project or environment files, locking available identically for both bare environments and projects, and projects simply adding the notion that a command is being captured.

If people agree with all the above, I'm still slightly unclear on how to achieve 3 without the notion of capturing a command, because to me the fact that an environment is tied to a specific command is what lets it be separate and not get polluted by my daily need to install stuff I need in my working session. I know how to curate environment+command as a unit, because to me that has a very clear remit: the envt contains only what that command needs, and no more or less. Each project is a separate environment from my daily working environment, which gets filled up with all sorts of non-reproducible who knows what as I do my work, and keeping projects fully separate like that is what makes anything reproducible for me. So, I'm not sure how to achieve 3 (curating a separate environment declaratively) without tying the environment to a specific command, but maybe I'm just showing my ignorance of poetry and other approaches. Probably something worth discussing orally instead of typing more here, though! Maybe it's a usage of conda run that I'm not imagining...

mattkram commented 2 years ago

Thank you for the write-up @jbednar, I think it is pretty close to my perspective as well. However, I tend to believe 2 & 3 should be integrated into a single solution, and then 1 is a follow-on from that.

For me, I create a new isolated environment for every project (analysis, repo, application, library, etc.). I prefer to keep them in a local path rather than global, and I barely ever use base. So for me, items 2 & 3 above are of highest priority, and I think intertwined. What tools like npm, poetry, and cargo do is allow specification of abstract dependencies in one file and a locking mechanism to output the exact version of every package that is installed into the environment. They also have a CLI for doing that. Arguably, the most ideal solution would be to have that as a feature of conda env. I started looking into existing work on this, and that is how I found anaconda-project and is what sparked my initial interest.

Just to clarify my thinking, my understanding of abstract and concrete dependencies is this:

Abstract dependencies: I want to define just the minimal list of dependencies for my project. This gives me less overhead when I need to upgrade a dependency, or remove one.
Concrete (locked) dependencies: I want an explicit list of exactly every version of every package and its subdependencies in a file that I can commit to version control.

With the CLI, I can update/edit the abstract dependencies file, and automatically generate the locked dependency file and keep them in sync. I can use a pre-commit hook to ensure I never commit an unlocked or mismatched pair of files, such that I can completely tear down and reproduce the environment from every platform it is said to be supported on.

anaconda-project currently does that (or most of it). I also think there is tremendous value in conda project run being able to serve as a means for specifying different commands that one may want to save. In some sense, this may be an all-in-one replacement for what make is often used for (make test, make build, etc.).

Whether these should be two separate efforts is an interesting question. I wouldn't be opposed to that, but I would say they should be designed to work together. My initial sense was that changing conda env will be harder and it makes sense to do it altogether, but I'm not wedded to that idea.

Wherever we end up, I think there are some amazing things related to environment management in anaconda-project right now, which other tools don't have (i.e. the ability to specify different dependencies for dev/prod/testing/etc.).

jbednar commented 2 years ago

Ok, sounds like we're on the same page, i.e. about having separate but mostly overlapping goals!

kenodegard commented 2 years ago

I agree with @mattkram and @jbednar

I see conda project as a viable alternative approach to environment management (conda env). I also don't see a huge difference between projects and packages, projects just feel like general purpose packages.

I absolutely love the commands feature. This is the part that makes a project a general purpose package to me. npm has this, I don't think poetry has this (?). I see it as an abstraction of the build/test commands in conda recipes. I think there's great value in being able to define any number of commands for a project/package (e.g. start-server, test, lint, build, etc.).

I see how lock files and environment management via add/remove feels separate but if neither are part of conda project then I feel this plugin is poorly named and should rather be renamed to conda command (we could then add a separate plugin exposing the add/remove environment management logic).

Another thing to keep in mind, the project.yml file contains much of the same kinds of information present in meta.yml which will be annoying to maintain if a project is also a package.

jbednar commented 2 years ago

I see conda project as a viable alternative approach to environment management (conda env). I also don't see a huge difference between projects and packages, projects just feel like general purpose packages.

True; both package recipes and project .ymls are specs for building artifacts, with projects being more general in that they encompass activities (like launching a live notebook) and not simply static outputs. But locked projects are also less general in a different sense, in that (unlike packages) they cannot be combined together.

I absolutely love the commands feature. This is the part that makes a project a general purpose package to me. npm has this, I don't think poetry has this (?). I see it as an abstraction of the build/test commands in conda recipes. I think there's great value in being able to define any number of commands for a project/package (e.g. start-server, test, lint, build, etc.).

Yep; having multiple commands that all share the same envt is crucial; we use that for all of the things you mention.

how lock files and environment management via add/remove feels separate but if neither are part of conda project then I feel this plugin is poorly named and should rather be renamed to conda command (we could then add a separate plugin exposing the add/remove environment management logic).

I think the independence only goes one way, i.e. that locking is useful for conda in general, while capturing commands without locking is only of limited use since it will soon break as dependencies change. So I would only argue that locking is valid on its own. add/remove, that's a separate story. :-)

Another thing to keep in mind, the project.yml file contains much of the same kinds of information present in meta.yml which will be annoying to maintain if a project is also a package.

I'm not sure how often that will come up, and getting a shared file format between conda and projects has proven to be a years-long endeavor, but sure; aim to unify all the ymls! :-)

anaconda / anaconda-project

Ideas for updated CLI in `conda-project` #362