environmental-forecasting / model-ensembler

Model Ensemble tool for batch workflows on HPCs
https://pypi.org/project/model-ensembler/
MIT License
13 stars 1 forks source link

Overwritten nodes in configurations #43

Open JimCircadian opened 2 months ago

JimCircadian commented 2 months ago

Due to the Batch defaults that are set via namedtuple in config.py, run specific values are overwritten when specified as default in the root vars of configurations. We should identify when these are being replaced in the batch context by None and ensure they aren't overwritten, as a placeholder for some clearer work in the future.

17:20:29                 DEBUG            TASK CTX: {'batch': 4,
 'cluster': 'dummy',
 'dataset': 'test_net_ds',
 '**email**': 'someone@example.com',
 'epochs': 100,
 'filter_factor': 1,
 'gpus': 1,
 '**length**': '1-00:00:00',
 'mem': '128gb',
 '**nodes**': 1,
 '**ntasks**': 2,
 'preload': 'test_net_ds',
 'strategy': 'default',
 'symlinks': ['../../../data',
              '../../../ENVS*',
              '../../../loader.test.json',
              '../../../test_net_ds',
              '../../../network_datasets',
              '../../../processed',
              '../../../results']}
17:20:29                 DEBUG            TASK FUNC: Task(name='execute', args={'cmd': 'mkdir -p ./results/networks'}, value=None)
17:20:29                 DEBUG            Calling with cmd = mkdir -p ./results/networks
17:20:29                 INFO             Running command: mkdir -p ./results/networks
17:20:29                 DEBUG            Executing command mkdir -p ./results/networks, cwd unset
17:20:29                 DEBUG            Command successful
17:20:29                 INFO             Start batch: 2024-08-21 16:20:29.854505
17:20:29                 DEBUG            Batch(name='test_ensemble', templates=['icenet_train.sh.j2'], templatedir='../template', job_file='icenet_tra
in.sh', basedir='./ensemble/test_ensemble', runs=[{'seed': 42}, {'seed': 46}], maxruns=2, maxjobs=2, repeat=False, cluster=None, **email**=None, **nodes**=None
, **ntasks**=0, **length**=None, pre_batch=[], pre_run=[], post_run=[], post_batch=[{'name': 'execute', 'args': {'cmd': '/usr/bin/echo "No postprocessing in pl
ace for training ensemble"'}}])
17:20:29                 INFO             Running cycle 1
17:20:29                 INFO             Start run test_ensemble-0 at 2024-08-21 16:20:29.859752
17:20:29                 DEBUG            Run(batch='4', cluster='gpu', dataset='test_net_ds', **email**='me@anotherdomain.ac.uk', epochs='10', filter_factor='1.
44', gpus='1', **length**=None, mem='128gb', **nodes**=None, **ntasks**='8', preload='test_net_ds', strategy='default', symlinks=['../../../data', '../../../ENVS*'
, '../../../loader.test.json', '../../../test_net_ds', '../../../network_datasets', '../../../processed', '../../../results'], name='test_ensemble', te
mplates=['icenet_train.sh.j2'], templatedir='../template', job_file='icenet_train.sh', basedir='./ensemble/test_ensemble', maxruns=2, maxjobs=2, repeat
=False, seed=42, idx=0, id='test_ensemble-0', dir='/data/hpcdata/users/jambyr/icenet4/pipeline.monthly/ensemble/test_ensemble/test_ensemble-0')