Add pin_memory=True when using a CUDA device to increase performance as suggested
by Torch documentation.
Add torch.no_grad() context manager in __call__() to increase performance.
Reduce memory swap between CPU and GPU by instantiating Tensor directly on the GPU device.
Improve some Warnings clarity (i.e. category and message).
Bug-fix MacOS multiprocessing. It was impossible to use in multiprocess since we were not testing whether torch
multiprocess was set properly. Now, we set it properly and raise a warning instead of an error.
Improve error handling with wrong checkpoint loading in AddressParser retrain_path use.
Add torch.compile integration to improve performance (Torch 1.x still supported) with mode="reduce-overhead" as
suggested in the documentation. It
increases the performance by about 1/100.
pin_memory=True
when using a CUDA device to increase performance as suggested by Torch documentation.torch.no_grad()
context manager in__call__()
to increase performance.torch.compile
integration to improve performance (Torch 1.x still supported) withmode="reduce-overhead"
as suggested in the documentation. It increases the performance by about 1/100.