NVIDIA / NeMo-Run

A tool to configure, launch and manage your machine learning experiments.
Apache License 2.0
79 stars 20 forks source link

Error when launching sequential jobs with task groups #35

Closed Kipok closed 2 months ago

Kipok commented 2 months ago
import os

import nemo_run as run
from nemo_run import SSHTunnel, GitArchivePackager, SlurmExecutor

if __name__ == "__main__":
    inline_script = run.Script(
        inline="""
echo "Hello 1"
echo "Hello 2"
"""
    )
    ssh_tunnel = SSHTunnel(...)
    packager = GitArchivePackager()
    executor = SlurmExecutor(...)
    with run.Experiment("nemo-skills-exps", executor=executor) as exp:
        exp.add([inline_script, inline_script], name='my-test-run')
        exp.add([inline_script, inline_script], name='my-test-run')
        exp.run(detach=True, sequential=True)

results in

─────────────────────────────────────────────────────────────────── Entering Experiment nemo-skills-exps with id: nemo-skills-exps_1725755478 ────────────────────────────────────────────────────────────────────
[17:31:18] Tasks will be scheduled all at once but executed sequentially.                                                                                                                        experiment.py:575
           Launching task my-test-run for experiment nemo-skills-exps                                                                                                                            experiment.py:601
           Error running task my-test-run: 'JobGroup' object has no attribute 'executor'                                                                                                         experiment.py:622
           Traceback (most recent call last):                                                                                                                                                    experiment.py:623
              File "/home/igitman/workspace/NeMo-Run/src/nemo_run/run/experiment.py", line 615, in run                                                                                                            
               job.executor.dependencies = deps  # type: ignore                                                                                                                                                   
               ^^^^^^^^^^^^                                                                                                                                                                                       
            AttributeError: 'JobGroup' object has no attribute 'executor'                                                                                                                                         

           Launching task my-test-run for experiment nemo-skills-exps                                                                                                                            experiment.py:601
           Error running task my-test-run: list index out of range                                                                                                                               experiment.py:622
           Traceback (most recent call last):                                                                                                                                                    experiment.py:623
              File "/home/igitman/workspace/NeMo-Run/src/nemo_run/run/experiment.py", line 609, in run                                                                                                            
               handle = dep.handle if isinstance(dep, Job) else dep.handles[0]                                                                                                                                    
                                                                ~~~~~~~~~~~^^^                                                                                                                                    
            IndexError: list index out of range                                                                                                                                                                   

───────────────────────────────────────────────────────────────────────────── Detaching from Experiment nemo-skills-exps_1725755478. ─────────────────────────────────────────────────────────────────────────────
           Task specific cleanup won't be run.                                                                                                                                                   experiment.py:922
           Ephemeral logs and artifacts may be lost.