Descriptions - Githubissues

dfuchsgruber commented 1 year ago

Adds per-experiment descriptions to SEML

What does this implement/fix?

This allows the user to add descriptions on a per-experiment basis to MongoDB. The preferred way to do so should be to add a description key to the experiment YAML-file's seml section, as in:

...
seml:
   description: My first experiment.
...

This description is available when running the following commands:

seml list: Will add a "Description(s)"-column that displays all distinct descriptions of experiments in each collection.

Coll… Stag… Pendi… Runn… Failed Kill… Inter… Comp… Total Description(s)                                           
────────────────────────────────────────────────────────────────────────────────────────────────────────────────  
test      0     67     1      0     0      3     1    72 Test Description                                         
test3   160      0     0      0     0      0     0   160 "Other Test Description", "Single Experiment Descripti…  
────────────────────────────────────────────────────────────────────────────────────────────────────────────────  
Total   160     67     1      0     0      3     1   232

seml status will display similar information

        Collection Staged Pending Running Failed Killed Interrupted Completed Total Description(s)              
        ────────────────────────────────────────────────────────────────────────────────────────────            
        test            0      67       1      0      0           3         1    72 Test Description

seml {collection} print-fail-trace will also display per-experiment descriptions in the footer of the panel.

╭───────────────── Experiment ID 95, Batch ID 5, Status: "KILLED", Slurm Array-Task id: 8169698-5 ─────────────────╮
│ [18:23:56][__main__][INFO] Dataset split 4, Model initialization 4                                               │
│         [19:50:58][__main__][INFO] Dataset split 4, Model initialization 4                                       │
│         [20:12:40][__main__][INFO] Dataset split 4, Model initialization 1                                       │
│         slurmstepd: error: *** JOB 8169760 ON gpu12 CANCELLED AT 2023-05-07T20:17:00 DUE TO TIME LIMIT ***       │
╰─────────────────────────────────── Description : GAL on CoraML with 35 nodes ────────────────────────────────────╯

Both seml list and seml {collection} status support the --full-descriptions flag, that will print full descriptions instead of truncating them such that they do not exceed a line, e.g.:

  Collection Staged Pending Running Failed Killed Interrupted Completed Total Description(s)                        
  ────────────────────────────────────────────────────────────────────────────────────────────────────────────────  
  test            0      67       1      0      0           3         1    72 Test Description                      
  test3         160       0       0      0      0           0         0   160 "Single Experiment Description",      
                                                                              "Other Test Description"              
  ────────────────────────────────────────────────────────────────────────────────────────────────────────────────  
  Total         160      67       1      0      0           3         1   232

The descriptions can also retroactively be set (=updated) or deleted via the new commands:

seml {collection} description set {description} that will set the description of all documents to a string. This command supports common filters on sacred ID, batch ID, states and custom user filters
seml {collection} description delete deletes all descriptions of these documents. Again, aforementioned common filters are supported.

Projections to fail-trace-printing

Somewhat orthogonal to experiment descriptions: The seml {collection} print-fail-trace command was extended to also display additional config fields requested by the user via the -p/ --projection flag, e.g.

seml gal_cora_35 print-fail-trace -p '["config.data.num_splits", "config.model.num_inits"]'
╭───────────────── Experiment ID 95, Batch ID 5, Status: "KILLED", Slurm Array-Task id: 8169698-5 ─────────────────╮
│ [18:23:56][__main__][INFO] Dataset split 4, Model initialization 4                                               │
│         [19:50:58][__main__][INFO] Dataset split 4, Model initialization 4                                       │
│         [20:12:40][__main__][INFO] Dataset split 4, Model initialization 1                                       │
│         slurmstepd: error: *** JOB 8169760 ON gpu12 CANCELLED AT 2023-05-07T20:17:00 DUE TO TIME LIMIT ***       │
╰─────── Description : GAL on CoraML with 35 nodes, config.data.num_splits : 5, config.model.num_inits : 5 ────────╯

As I encountered looking into the MongoDB for what parameters might have triggered experiment failure, this is another IMHO very convenient functionality.

Additional information

[ ] I updated the docs via typer-cli with _SEML_COMPLETE=1 typer seml.__main__ utils docs --name seml --output docs.md or did not change the CLI : typer is not a valid command after installing the requirements. Is this an issue with the typer version? How do I build docstrings otherwise? Also, python -m typer is not valid.

n-gao commented 1 year ago

Thanks a lot! Great contribution. A few more comments:

command chaining is not working correctly, we need to rewrite the splitting to account for subtyper.
- Here the issue is that we split by command names but subtyper are groups and we have to split by groups but keep the commands associated with their command.
- A decision has to be made on how we handle certain behaviors. Do we want to allow chaining in the first place? Is seml collection description set ... delete ... the same as seml collection description set ...; seml collection delete or seml collection description set ...; seml collection description delete ...? I.e., do we reset after every command back to the top-level command? If not one can never return to chain top-level commands.
Should we add seml collection description list to show which jobs have which description?

Running seml list gives

/nfs/staff-ssd/gaoni/miniconda3/envs/jax4/lib/python3.11/typing.py:1275 in __getattr__           │
│                                                                                                  │
│   1272 │   │   # Also for simplicity we don't relay any dunder names                             │
│   1273 │   │   if '__origin__' in self.__dict__ and not _is_dunder(attr):                        │
│   1274 │   │   │   return getattr(self.__origin__, attr)                                         │
│ ❱ 1275 │   │   raise AttributeError(attr)                                                        │
│   1276 │                                                                                         │
│   1277 │   def __setattr__(self, attr, val):                                                     │
│   1278 │   │   if _is_dunder(attr) or attr in {'_name', '_inst', '_nparams',                     │
│                                                                                                  │
│ ╭────── locals ──────╮                                                                           │
│ │ attr = '__args__'  │                                                                           │
│ │ self = typing.List │                                                                           │
│ ╰────────────────────╯                                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: __args__

n-gao commented 1 year ago

I additionally implemented seml xyz description list.

n-gao commented 1 year ago

LGTM

TUM-DAML / seml

Descriptions #118

What does this implement/fix?

Projections to fail-trace-printing

Additional information