InstituteforDiseaseModeling / idmtools

https://docs.idmod.org/projects/idmtools/en/latest/
Other
4 stars 4 forks source link

Random fail with large number of simulations with NoneType of platform #2391

Closed shchen-idmod closed 1 week ago

shchen-idmod commented 1 week ago

Repro example: Run example: https://github.com/shchen-idmod/emodpy-malaria-hub/blob/fix_examples_slurm/examples/burnin_create_infections/example_mpirun_create_and_use_burnin.py Note, we need to run serialized simulation case(by set serialize = True, but need to run burnin -"serialize = False" first to get experiment id) Also make lot more simulations by adding "builder.add_sweep_definition(partial(set_param, param='Run_Number'), range(num_seeds))" to lines https://github.com/shchen-idmod/emodpy-malaria-hub/blob/a5955decdc4f179a5b75bbc62671a9e860ed81a0/examples/burnin_create_infections/example_mpirun_create_and_use_burnin.py#L168:

    if serialize:  # Use burnin simulations
        burnin_df = build_burnin_df(burnin_exp_id, platform, sim_years * 365)
        builder.add_sweep_definition(partial(sweep_burnin_simulations, df=burnin_df), burnin_df.index)
        num_seeds = 100
        builder.add_sweep_definition(partial(set_param, param='Run_Number'), range(num_seeds))
        experiment_name = f"Use_burnin_simulations {os.path.split(sys.argv[0])[1]}"

So total simulations will be 1000.

You will see it will throw error randomly like: 
WARNING: During schema-based param purge, Num_Cores not in schema.
Traceback (most recent call last):
  File "/home/scj6369/github/emodpy-malaria/examples/burnin_create_infections/example_mpirun_create_and_use_burnin.py", line 231, in <module>
    general_sim(selected_platform)
  File "/home/scj6369/github/emodpy-malaria/examples/burnin_create_infections/example_mpirun_create_and_use_burnin.py", line 176, in general_sim
    experiment.run(wait_until_done=True, platform=platform)
WARNING: During schema-based param purge, Num_Cores not in schema.
  File "/home/scj6369/venv/malaria_idmtools_201/lib/python3.10/site-packages/idmtools/entities/experiment.py", line 560, in run
    p.run_items(self, **run_opts)
  File "/home/scj6369/venv/malaria_idmtools_201/lib/python3.10/site-packages/idmtools/entities/iplatform.py", line 507, in run_items
    getattr(self, interface).run_item(item, **kwargs)
  File "/home/scj6369/venv/malaria_idmtools_201/lib/python3.10/site-packages/idmtools/entities/iplatform_ops/iplatform_experiment_operations.py", line 268, in run_item
    self.pre_run_item(experiment, **kwargs)
  File "/home/scj6369/venv/malaria_idmtools_201/lib/python3.10/site-packages/idmtools/entities/iplatform_ops/iplatform_experiment_operations.py", line 228, in pre_run_item
    experiment.simulations = self.platform._create_items_of_type(experiment.simulations, ItemType.SIMULATION,
  File "/home/scj6369/venv/malaria_idmtools_201/lib/python3.10/site-packages/idmtools/entities/iplatform.py", line 469, in _create_items_of_type
    ni = getattr(self, interface).batch_create(items, **kwargs)
  File "/home/scj6369/venv/malaria_idmtools_201/lib/python3.10/site-packages/idmtools/entities/iplatform_ops/iplatform_simulation_operations.py", line 136, in batch_create
    return batch_create_items(sims, create_func=self.create, display_progress=display_progress,
  File "/home/scj6369/venv/malaria_idmtools_201/lib/python3.10/site-packages/idmtools/entities/iplatform_ops/utils.py", line 161, in batch_create_items
    results = show_progress_of_batch(prog, futures)
  File "/home/scj6369/venv/malaria_idmtools_201/lib/python3.10/site-packages/idmtools/entities/iplatform_ops/utils.py", line 182, in show_progress_of_batch
    result = future.result()
  File "/software/python/3.10.1/lib/python3.10/concurrent/futures/_base.py", line 438, in result
    return self.__get_result()
  File "/software/python/3.10.1/lib/python3.10/concurrent/futures/_base.py", line 390, in __get_result
    raise self._exception
  File "/software/python/3.10.1/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/scj6369/venv/malaria_idmtools_201/lib/python3.10/site-packages/idmtools/entities/iplatform_ops/utils.py", line 61, in item_batch_worker_thread
    ret.append(create_func(item, **kwargs))
  File "/home/scj6369/venv/malaria_idmtools_201/lib/python3.10/site-packages/idmtools/entities/iplatform_ops/iplatform_simulation_operations.py", line 101, in create
    simulation._platform_object = self.platform_create(simulation, **kwargs)
  File "/home/scj6369/github/idmtools/idmtools_platform_slurm/idmtools_platform_slurm/platform_operations/simulation_operations.py", line 62, in platform_create
    meta = self.platform._metas.dump(simulation)
  File "/home/scj6369/github/idmtools/idmtools_platform_slurm/idmtools_platform_slurm/platform_operations/json_metadata_operations.py", line 109, in dump
    meta = self.get(item)
  File "/home/scj6369/github/idmtools/idmtools_platform_slurm/idmtools_platform_slurm/platform_operations/json_metadata_operations.py", line 89, in get
    meta['dir'] = os.path.abspath(self.platform.get_directory(item))
  File "/home/scj6369/github/idmtools/idmtools_platform_slurm/idmtools_platform_slurm/slurm_platform.py", line 300, in get_directory
    return self._op_client.get_directory(item)
  File "/home/scj6369/github/idmtools/idmtools_platform_slurm/idmtools_platform_slurm/slurm_operations/local_operations.py", line 85, in get_directory
    exp_dir = self.get_directory(exp)
  File "/home/scj6369/github/idmtools/idmtools_platform_slurm/idmtools_platform_slurm/slurm_operations/local_operations.py", line 74, in get_directory
    suite = self.platform.get_item(suite_id, ItemType.SUITE)
  File "/home/scj6369/venv/malaria_idmtools_201/lib/python3.10/site-packages/idmtools/entities/iplatform.py", line 245, in get_item
    return_object.platform = self
AttributeError: 'NoneType' object has no attribute 'platform'