PathologyDataScience / glimr

A simplified wrapper for hyperparameter search with Ray Tune.
Apache License 2.0
1 stars 0 forks source link

Allocated resources not scaling trials #26

Open cooperlab opened 1 year ago

cooperlab commented 1 year ago

For both GPU and CPU, increasing the resources does not increase the number of concurrently running trials.

lawrence-chillrud commented 1 year ago

This page from Ray Tune's documentation could prove helpful. Seems like wrapping trainable in a call to tune.with_resources could be worth trying, e.g.,

tune.Tuner(tune.with_resources(trainable, {"cpu": 2, "gpu": 1}, tune_config=tune.TuneConfig(num_samples=8)))

I think this should happen around here in Search.experiment.

create-issue-branch[bot] commented 1 year ago

Branch issue-26-Allocated_resources_not_scaling_trials created!

RaminNateghi commented 1 year ago

The number of concurrent running trails directly depends on the number of GPU/CPU cores we allocate to each trail, which means adding more CPU and GPU cores to the trails is not necessarily increase the number of running trails but it also decreases that. based on the experiments, it has been shown that the speed of tuning greatly depends on the number of concurrent running trials than number of resources allocated to each trial.

Please take a look at PR https://github.com/PathologyDataScience/glimr/pull/56 in which I have added something to support multi-gpu distributed tuning.

cooperlab commented 1 year ago

@RaminNateghi please see my comments on the PR.

Since it is hard to saturate the GPUs during MIL training, please investigate if it is possible to allocate fractional GPU resources. For example, perhaps we can run 16 trials by allocating 0.5 GPUs / trial. This might increase utilization.

We will also need to edit documentation and notebooks once the Search class updates are final.

RaminNateghi commented 1 year ago

Yes, it's technically possible to allocate fractional GPU resources. For example, I just set resources_per_worker={"GPU": 0.25}, and it enabled tuner to run 4X concurrent trials.

cooperlab commented 1 year ago

Yes, it's technically possible to allocate fractional GPU resources. For example, I just set resources_per_worker={"GPU": 0.25}, and it enabled tuner to run 4X concurrent trials.

Can you check if this increases utilization from nvidia-smi?

RaminNateghi commented 1 year ago

yes, it increases the utilization, but when we use fractional gpu resources, some trials are failed with these errors "worker/replica:0/task:0/device:GPU:0}} failed to allocate memory [Op:Cast]" or "failed copying input tensor from /job:worker/replica:0/task:0/device:CPU:0 to /job:worker/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized". For example, in my experiment, 13 trials out of 64 trials are failed when I allocate 0.5 GPUs per trial.

https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/issues/1469

cooperlab commented 1 year ago

OK - let's save that for another day. For now we can recommend integer allocations.

cooperlab commented 1 year ago

Perhaps this is related to the tendency of TensorFlow to allocate all GPU memory even for a small job.

https://docs.ray.io/en/latest/ray-core/tasks/using-ray-with-gpus.html#fractional-gpus

Note: It is the user’s responsibility to make sure that the individual tasks don’t use more than their share of the GPU memory. TensorFlow can be configured to limit its memory usage.

I'm not sure if this is the best solution, but it's one solution: https://discuss.ray.io/t/tensorflow-allocates-all-available-memory-on-the-gpu-in-the-first-trial-leading-to-no-space-left-for-running-additional-trials-in-parallel/7585/2