Open carmocca opened 1 year ago
I think this is a good improvement. For apple silicon, we would either have to show something generic or find a robust way to determine the name.
Found this: https://stackoverflow.com/a/69997851 but it uses a pypi package
@carmocca here is a minimal implementation for this, is this what is required?
def _log_device_info(trainer: "pl.Trainer") -> None:
def get_device_name(accelerator, device=None) -> str:
if isinstance(accelerator, CUDAAccelerator):
return torch.cuda.get_device_name(device)
elif isinstance(accelerator, TPUAccelerator):
try:
from torch_xla.experimental import tpu
import torch_xla.core.xla_env_vars as xenv
return tpu.get_tpu_env()[xenv.ACCELERATOR_TYPE]
except:
pass
return "True"
gpu_name = ""
if isinstance(trainer.accelerator, (CUDAAccelerator, MPSAccelerator)):
gpu_name = get_device_name(trainer.accelerator, trainer.device)
gpu_name = f", using {trainer.num_devices} devices: {gpu_name}"
rank_zero_info(f"GPU available: {CUDAAccelerator.is_available() or MPSAccelerator.is_available()}{gpu_name}")
tpu_name = ""
if isinstance(trainer.accelerator, TPUAccelerator):
tpu_name = get_device_name(trainer.accelerator)
tpu_name = f", using {trainer.num_devices} devices: {tpu_name}"
rank_zero_info(f"TPU available: {TPUAccelerator.is_available()}{tpu_name}")
num_ipus = trainer.num_devices if isinstance(trainer.accelerator, IPUAccelerator) else 0
rank_zero_info(f"IPU available: {_IPU_AVAILABLE}, using: {num_ipus} IPUs")
if _LIGHTNING_HABANA_AVAILABLE:
from lightning_habana import HPUAccelerator
num_hpus = trainer.num_devices if isinstance(trainer.accelerator, HPUAccelerator) else 0
else:
num_hpus = 0
rank_zero_info(f"HPU available: {_HPU_AVAILABLE}, using: {num_hpus} HPUs")
# TODO: Integrate MPS Accelerator here, once gpu maps to both
if CUDAAccelerator.is_available() and not isinstance(trainer.accelerator, CUDAAccelerator):
rank_zero_warn(
"GPU available but not used. Set `accelerator` and `devices` using"
f" `Trainer(accelerator='gpu', devices={CUDAAccelerator.auto_device_count()})`.",
category=PossibleUserWarning,
)
if TPUAccelerator.is_available() and not isinstance(trainer.accelerator, TPUAccelerator):
rank_zero_warn(
"TPU available but not used. Set `accelerator` and `devices` using"
f" `Trainer(accelerator='tpu', devices={TPUAccelerator.auto_device_count()})`."
)
if _IPU_AVAILABLE and not isinstance(trainer.accelerator, IPUAccelerator):
rank_zero_warn(
"IPU available but not used. Set `accelerator` and `devices` using"
f" `Trainer(accelerator='ipu', devices={IPUAccelerator.auto_device_count()})`."
)
if _HPU_AVAILABLE:
if not _LIGHTNING_HABANA_AVAILABLE:
raise ModuleNotFoundError(
"You are running on HPU machine but you have not installed `lightning-habana`"
f" extension is {str(_LIGHTNING_HABANA_AVAILABLE)}."
)
from lightning_habana import HPUAccelerator
if not isinstance(trainer.accelerator, HPUAccelerator):
rank_zero_warn(
"HPU available but not used. Set `accelerator` and `devices` using"
f" `Trainer(accelerator='hpu', devices={HPUAccelerator.auto_device_count()})`."
)
if MPSAccelerator.is_available() and not isinstance(trainer.accelerator, MPSAccelerator):
rank_zero_warn(
"MPS available but not used. Set `accelerator` and `devices` using"
f" `Trainer(accelerator='mps', devices={MPSAccelerator.auto_device_count()})`."
)
I tested this on a Kaggle Notebook the output looks like this for different cases:
GPU is not available:
GPU available: False
GPU Available:
GPU available: True, using 1 devices: Tesla P100-PCIE-16GB
TPU Available:
TPU available: True, using 8 devices: v3-8
@ishandutta0098 That's the idea, but I suggest that this is done through an Accelerator.device_name
staticmethod instead of adding the logic to get the name directly there.
Also, I suggest adhering to the original proposal where the device name goes first and is separate from the number of devices.
Description & Motivation
Revamp
To
The relevant code is: https://github.com/Lightning-AI/lightning/blob/f14ee9edbc8269054e12daf30b8681d530e73369/src/lightning/pytorch/trainer/setup.py#L145-L171
Pitch
If the accelerator is available,
True
changes to the actual name of the accelerator used. If it's unavailable, we still showFalse
.For GPUs, the
cuda|mps
field is gone, as it should be clear from the device.I also propose that the GPU field shows the number of devices, instead of a used boolean.
We can get this info via
For MPS, HPU, IPU we would need to find out if we can get this information. In the meantime, we can still fallback to "True" for them.
This could be done by introducing an
Accelerator.device_name(device)
staticmethodAlternatives
One caveat is that this might be misleading with heterogeneous devices, as only rank zero prints this information.
Additional context
No response
cc @borda @justusschock @awaelchli