Closed rayandrew closed 1 month ago
Along with this PR, just want to point out the prefetch factor below while having discussion with @hariharan-devarajan recently
# torch_data_loader.py
if self._args.read_threads >= 1:
prefetch_factor = math.ceil(self._args.prefetch_size / self._args.read_threads)
else:
prefetch_factor = self._args.prefetch_size
Here, we limit the prefetch factor based on number of threads. Is it intended? Probably it is better from user perspective to put everything as prefetch factor without making assumption that this number will be divided by number of workers.
Along with this PR, just want to point out the prefetch factor below while having discussion with @hariharan-devarajan recently
# torch_data_loader.py if self._args.read_threads >= 1: prefetch_factor = math.ceil(self._args.prefetch_size / self._args.read_threads) else: prefetch_factor = self._args.prefetch_size
Here, we limit the prefetch factor based on number of threads. Is it intended? Probably it is better from user perspective to put everything as prefetch factor without making assumption that this number will be divided by number of workers.
@zhenghh04 I agree with Ray that this is confusing for users of the benchmark. I would recommend to not do this.
Hi @zhenghh04 and @hariharan-devarajan,
This PR gives us option to disable pin memory in pytorch dataloader. I encountered error while emulating stormer when we increase the prefetch factor since the memory is being reserved and cannot be released until the dataloader is finished. Also, pin_memory is beneficial for GPU memory pinning which we do not use at all for DLIO [1, 2]
I set the default to True for backward compatibility, let me know if I should put
pin_memory=False
as default in the configuration.Reference: [1] https://pytorch.org/tutorials/intermediate/pinmem_nonblock.html [2] https://discuss.pytorch.org/t/when-to-set-pin-memory-to-true/19723/19