TUM-DAML / seml

SEML: Slurm Experiment Management Library
Other
165 stars 29 forks source link

CLI Argument Completion #115

Closed dfuchsgruber closed 1 year ago

dfuchsgruber commented 1 year ago

What does this implement/fix?

This PR adds (rudimentary) argument completion using argcomplete. The main appeal should be, that collections can be tab-completed (and thus you can see available connections when pressing tab).

E.g., when pressing tab, you get bash tab-completion for seml commands:

$ seml gal_
gal_citeseer_30             gal_cora_ml_init2           gal_pubmed_init             gal_sbm_layers
gal_citeseer_init           gal_gcn_initial_broken_rng  gal_sbm_gcn                 gal_test
gal_cora_35                 gal_gpn_init_pool           gal_sbm_grid
gal_cora_ml_init            gal_pubmed_15               gal_sbm_grid2

This comes with two caveats:

This PR also changes the behaviour of seml configure: It adds flags --mongodb, which replicates the previous behaviour and configures MongoDB credentials as well as --argcomplete, which registers the argcomplete script such that it is automatically sourced whenever a new shell session is started (this is needed for tab completion to work). Additionally, --all will query the user for both.

n-gao commented 1 year ago

Thanks for the great work :) Here are a few comments before I properly review:

n-gao commented 1 year ago

I played around a bit and the autocomplete seems really useful, but the time for competition is too long imho. After some quick tests, it is definitely related to the imports. Here are some simple timings: time seml -> 1.2s time python -c "print('hello world')" -> 0.04s time python -c "import seml;" -> 1.2s time python -c "import pymongo;" -> 0.4s

Can we reduce the time to something like 0.4s, i.e., only the pymongo time?

dfuchsgruber commented 1 year ago

Thanks for the feedback. Here a few points of discussion:

  • Do we need click as an extra library? The other prompts we solved via input.

Click provides an easy way for y/n inputs, we can remove it and implement it ourselves, the question is if this minimal requirement is worth additional code.

  • Discussion: should the CLI be:

    • seml configure doing both (prompting if you want to do the CLI completion)
    • seml configure mongodb doing the previous thing
    • seml configure argcomplete doing the new stuff

I am not against this, it may even be more in line with seml's current CLI. One concern would be that if in the future more configurable elements are added, it may be more convenient to use multiple CLI flags at once to decide which components to configure.

  • If we change the default behavior we must update the readme. Also adding documentation for this feature in general to the readme would be nice.

I agree, depending on which route we go down in the future (see point above) I will adapt the readme.

  • Question: How would a user uninstall the argcomplete or does it leave artifacts if one uninstalls seml?

Uninstalling seml will not directly leave any artefacts. However, once the user uninstalls argcomplete, the bash-completion scripts written by it (depending on the activation mode) will remain. Uninstalling that is not handled by argcomplete. However, I believe this is not within the scope of seml.

  • We should give some additional information about the pros and cons of the installation paths, right? Like use user if you're not sudo, etc.

I'd suggest adding this to the README

  • Can we reduce the time to something like 0.4s, i.e., only the pymongo time?

In theory, yes. This would however require restructuring most if not all imports in seml and may make other commands slower. One alternative would be the use of lazy_imports. What do you think?

n-gao commented 1 year ago

the question is if this minimal requirement is worth additional code.

Not sure whether it's actually shorter than the solution we have elsewhere. But, it's better since it also reprompts after incorrect inputs. Could we adapt it then everywhere (also probably put a upper limit on the version so we don't run into compatibility issues?)?

One concern would be that if in the future more configurable elements are added

That's a good one, I haven't thought about. But we can also implement this via seml configure mongodb argcomplete?

However, I believe this is not within the scope of seml.

Agree, a note in the readme is probably nice.

One alternative would be the use of lazy_imports.

That looks reasonable, though doesn't have the nicest syntax? Other options are:

dfuchsgruber commented 1 year ago

Could we adapt it then everywhere

Not against that, even though I am not sure if it really is that straightforward.

That's a good one, I haven't thought about. But we can also implement this via seml configure mongodb argcomplete?

Of course, even though to me this is less natural compared to "standard" CLIs (plus implementation-wise chaining these like commands is more effort than just adding a bunch of flags)

Other options are

  • I am not a big fan of moving import into wrapper functions as it clutters the code a lot.
  • The second alternative I am not sure what exactly you mean by that
  • I thought those were the same package, but seems like they aren't. Fine with either, the one you propose seems to have the most clean syntax
n-gao commented 1 year ago

Not against that, even though I am not sure if it really is that straightforward.

From what I know, should be fairly simple :) I can also do that after this PR has been merged.

Of course, even though to me this is less natural compared to "standard" CLIs

Let's just go with your version. Can anyway be changed if we find it unfit. But, the default behavior should remain? Or prompt the user for both?

The second alternative I am not sure what exactly you mean by that

Something like:

parser_start.set_defaults(func='seml.start.start_experiments', set_to_pending=True)
# instead of
parser_start.set_defaults(func=start_experiments, set_to_pending=True)

# to load these functions
def run_method(func, *args, **kwargs):
    *module, function = func.split('.')
    module = importlib.import_module('.'.join(module))
    return getattr(module, function)(*args, **kwargs)
n-gao commented 1 year ago

I did some quick investigation, when python loads the entry point, it also always loads the whole package, i.e., it is insufficient to adapt the import structure in main.py. One would have to redo the whole import structure to accelerate this (seems suboptimal tbh).

n-gao commented 1 year ago

In https://github.com/TUM-DAML/seml/commit/d8501bdb69d5c91ee6edaff7d2573de542d8ef43, I updated our import structure in seml. Essentially, all "large" modules are now imported lazy. This already improved start times quite a bit. The biggest remaining issue is munch at the moment. Their import times got better in 3.0.0 which is not supported by sacred at the moment. Once https://github.com/IDSIA/sacred/pull/917 is merged and published, we should directly benefit from that.

dfuchsgruber commented 1 year ago

In d8501bd, I updated our import structure in seml. Essentially, all "large" modules are now imported lazy. This already improved start times quite a bit. The biggest remaining issue is munch at the moment. Their import times got better in 3.0.0 which is not supported by sacred at the moment. Once IDSIA/sacred#917 is merged and published, we should directly benefit from that.

So I guess importing in methods is now the way we go and not any of the lazy import modules discussed above?

n-gao commented 1 year ago

Since we would anyway have to adapt the whole codebase, this is the easiest (especially because it has already been done in some parts of the codebase).