anaconda / anaconda-project

Tool for encapsulating, running, and reproducing data science projects
https://anaconda-project.readthedocs.io/en/latest/
Other
221 stars 88 forks source link

[PROPOSAL] conda project cli #284

Closed AlbertDeFusco closed 2 years ago

AlbertDeFusco commented 4 years ago

This is meant to demonstrate and seek feedback for conda project with minimal changes.

jbednar commented 4 years ago

Is there any way we could mark specific functionality as not being available when called as conda project, but still available (for legacy purposes) if called as anaconda project? That way we could work out precisely which of the many features of anaconda project we would want to be considered part of conda.

AlbertDeFusco commented 4 years ago

Yeah, I think we can do that by splitting apart the internal/cli/main.py argparse code into two entry points with with a shared module of common commands.

For discussion what commands would be good to have in the includelist for conda project? Here's a few that I would include:

Other commands that I'm not yet convinced would fit are

All packages and env-spec commands could be dropped since that can be handled by the environment.yml file as provided in #275.

The init command could be useful if redesigned for this usage, exactly what that would be is TBD.

Considering how we could handle lock and platforms features has some overlap with #275 as well.

AlbertDeFusco commented 4 years ago

From @jlstevens

As much as I love being able to type conda project there are some ways that make it obvious that anaconda-project is trying to integrate with conda rather than the other way around (possibly unavoidable unless things are also updated at the conda end?):

  1. If you just run conda, the help lists the available subcommands but this doesn't list project.
  2. If you run conda project, help output mentions anaconda-project again:
 conda project
 Must specify a subcommand.
 usage: anaconda-project [-h] [-v] [--verbose] ....

On one hand, I really want to use the conda project command but on the other hand I am slightly worried that issues like these might cause confusion.

You'll notice that points 1 and 2 are addressed in this PR.

mcg1969 commented 4 years ago

I am 100% behind defining a strict subset of functionality for conda project. I'd even go so far as to suggest that we issue deprecation warnings whenever someone uses a command in conda project that is not on this list.

jbednar commented 4 years ago

The archive/unarchive features seem core to me; they help define what it means to be a complete project (e.g. omitting the envs/ subdirectory, and ideally also omitting nonsense like the .ipynb_checkpoints files that always end up in them). In fact, the .zip file is what I would call the "project"; that's what I'd want to back up, email to people, put onto removable media, etc.

I agree that upload and download are specific to anaconda.org and thus not currently suitable for a conda project command. "dock" is a big discussion to have; happy to leave that off the list until it's clear. There are also other aspects of the functionality not covered by console/module commands, such as anaconda-project.yml support for environment variables and datasets, that are worth discussing. Just as for conda project vs. anaconda project, maybe some features of anaconda-project.yml would be disabled for a file named conda-project.yml or project.yml. I think we should be intentional about anything that is exposed via conda project because it needs to be something we can stand behind as part of conda indefinitely.

AlbertDeFusco commented 4 years ago

@jbednar could you explain more about what you mean with "support for environment variables and datasets"? Is there something you'd like to change with the way they are currently supported?

mcg1969 commented 4 years ago

cc: @edennuriel

jbednar commented 4 years ago

something you'd like to change with the way they are currently supported

I think @mcg1969 would be the one with opinions here; I haven't used them much and I think I'm happy with how they work in anaconda-project as far as I've encountered them. However, I'm no expert there and I don't know how much that support applies or should apply to conda more generally, so I'd want someone who does know that aspect of anacond-project better to evaluate whether such behavior is appropriate for a conda project command.

mcg1969 commented 4 years ago

I haven't used anaconda-project's dataset functionality much if at all. The environment variable functionality seems immensely useful, enough so that I'd even want to evolve conda env to support them as well as conda project.

mcg1969 commented 4 years ago

One general point: I am frankly not that big of a fan of most of the CLI commands for anaconda-project. YMMV I know, but for me, most of the commands for adding/removing packages, variables etc. are just more naturally accomplished by editing the file. conda env doesn't have an equivalent set of commands.

The commands I do like are the ones that effect global change, such as prepare, run, lock, etc.

jbednar commented 4 years ago

The commands I do like are the ones that effect global change, such as prepare, run, lock, etc.

Same here. I have never even tried the ones that simply edit the file, because for personal use I will always just edit the file myself. I assume those commands are useful mainly when implementing a non-Python GUI, i.e. as commands that can be invoked to take some data entered in a form and make it end up in the .yml. So I'm happy for them not to be available in a conda project command; I think people who can issue a CLI command are also people who can use a text editor.

I haven't used anaconda-project's dataset functionality much if at all.

Sounds like we have complementary experience here. I haven't used the envt variable stuff much if at all, but we use the download functionality extensively in our public examples, e.g.:

https://github.com/pyviz-topics/examples/blob/master/bay_trimesh/anaconda-project.yml#L40 https://github.com/pyviz-topics/examples/blob/master/uk_researchers/anaconda-project.yml#L41 https://github.com/pyviz-topics/examples/blob/master/opensky/anaconda-project.yml#L41

It's great for making a runnable example project that's small enough to be practical to archive, put in version control, put on a website, send in email, etc., without polluting an example notebook with code to fetch and cache an external dataset. The dataset simply needs to be listed in the .yml, and it's then fetched and ready when the command is run. It's very useful functionality, but in my opinion is project specific, i.e., should only be relevant to conda project and not conda itself.

AlbertDeFusco commented 2 years ago

See #363 for the reboot of this PR. All comments provided here will be considered.