Closed Borda closed 2 years ago
Few stack traces from above run.
Maybe I think we should use @torch.no_grad()
at places. Trying this in #531
E RuntimeError: [enforce fail at ..\c10\core\CPUAllocator.cpp:73] data. DefaultCPUAllocator: not enough memory: you tried to allocate 51380224 bytes. Buy new RAM!
> for chunk in iter(lambda: f.read(chunk_size), b''):
E MemoryError
This is either MultiProcessing in python issue or pickle.
c:\hostedtoolcache\windows\python\3.6.8\x64\lib\multiprocessing\popen_spawn_win32.py:65: in __init__
reduction.dump(process_obj, to_child)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
obj = <Process(Process-19, initial daemon)>, file = <_io.BufferedWriter name=11>
protocol = None
def dump(obj, file, protocol=None):
'''Replacement for pickle.dump() using ForkingPickler.'''
> ForkingPickler(file, protocol).dump(obj)
E BrokenPipeError: [Errno 32] Broken pipe
Another reason is perhaps we are trying to save a checkpoint on windows, which needs movement of file object from RAM to Disk and hence these errors. Maybe disabling checkpoints, or model saving wherever unnecessary can avoid these ?
π Bug
In #522 we have to reveal an issue in CI (rather on the GHA side than ours)that Windows tests we marked as passing check even almost all the time the tests were failing... These tests shall be fixed or skip per test with todo...
To Reproduce
https://github.com/PyTorchLightning/pytorch-lightning-bolts/runs/1718502565
Additional context