NVIDIA / NeMo-Run

A tool to configure, launch and manage your machine learning experiments.
Apache License 2.0
79 stars 20 forks source link

Adding some Experiment improvements #13

Closed marcromeyn closed 2 months ago

marcromeyn commented 3 months ago

This PR improves the UX around inspecting Experiments. I propose to be clearer about the naming we use:

hemildesai commented 3 months ago

Can you also update and test https://github.com/NVIDIA/NeMo-Run/blob/main/examples/hello-world/hello_experiments.ipynb?

hemildesai commented 2 months ago

The hello_experiments.ipynb is broken. I'm getting

TypeError                                 Traceback (most recent call last)
Cell In[5], line 2
      1 with run.Experiment("add_object", executor=run.LocalExecutor()) as exp:
----> 2     exp.add(fn_1, tail_logs=True)
      3     exp.add(fn_2, tail_logs=True)
      4     exp.run()

File ~/dev/NeMo-Run/src/nemo_run/run/experiment.py:469, in Experiment.add(self, task, executor, name, plugins, tail_logs)
    466     assert name, "name is required for task group."
    467     self._add_task_group(task, executor, name, plugins=plugins, tail_logs=tail_logs)
--> 469 self._save_jobs()

File ~/dev/NeMo-Run/src/nemo_run/run/experiment.py:334, in Experiment._save_jobs(self)
    332 main_module = sys.modules["__main__"]
    333 with open(os.path.join(self._exp_dir, "__main__.py"), "w+") as f:
--> 334     f.write(inspect.getsource(main_module))

File ~/.rye/py/cpython@3.11.9/lib/python3.11/inspect.py:1258, in getsource(object)
   1252 def getsource(object):
   1253     """Return the text of the source code for an object.
   1254 
   1255     The argument may be a module, class, method, function, traceback, frame,
   1256     or code object.  The source code is returned as a single string.  An
   1257     OSError is raised if the source code cannot be retrieved."""
...
--> 897     raise TypeError('{!r} is a built-in module'.format(object))
    898 if isclass(object):
    899     if hasattr(object, '__module__'):

TypeError: <module '__main__'> is a built-in module

I would also just recommend removing examples/experiment and integrate into the hello_experiments.ipynb.