RayZhhh / funsearch

Implementation for "Mathematical discoveries from program search with large language models".
Apache License 2.0
15 stars 1 forks source link

Help on Parallelising #7

Open jack-powers opened 1 month ago

jack-powers commented 1 month ago

Hi there,

Thank you for this repository. I have a question on the best way of parallelising this code for use on multi-core CPUs.

So far, I believe parallelising multiple samplers is as trivial as the following in funsearch.py:

with concurrent.futures.ThreadPoolExecutor() as executor:
        executor.map(lambda s: s.sample(profiler=profiler), samplers)

However, to parrallelise the evaluators, does only this loop in sampler.py need to be modified:

 # This loop can be executed in parallel on remote evaluator machines.
  for sample in samples:
      self._global_sample_nums_plus_one()  # RZ: add _global_sample_nums
      cur_global_sample_nums = self._get_global_sample_nums()
      chosen_evaluator: evaluator.Evaluator = np.random.choice(self._evaluators)
      chosen_evaluator.analyse(
          sample,
          prompt.island_id,
          prompt.version_generated,
          **kwargs,
          global_sample_nums=cur_global_sample_nums,
          sample_time=sample_time
      )

If this was the case, would management of the evaluators need to be implemented, to ensure that work is distributed across multiple evaluators?

Or would multiple parallel evaluators need to be set up in funsearch.py:

samplers = [sampler.Sampler(database, evaluators, config.samples_per_prompt, max_sample_nums=max_sample_nums, llm_class=class_config.llm_class)
                for _ in range(config.num_samplers)]

Thank you