Trying to run this in Win10 WSL2 on a 3080TI /w 12gb VRAM. Setting the offload_per_layer=7 does not seem to help, VRAM memory usage never goes above 6.5gb so there seems to be lots of room available.
/home/mrnova/.conda/envs/mixtral/lib/python3.10/site-packages/torch/nn/init.py:412: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
Traceback (most recent call last):
File "/home/mrnova/mixtral-offloading/main.py", line 54, in <module>
model = build_model(
File "/home/mrnova/mixtral-offloading/src/build_model.py", line 204, in build_model
expert_cache = ExpertCache(
File "/home/mrnova/mixtral-offloading/src/expert_cache.py", line 67, in __init__
self.offloaded_storages = [
File "/home/mrnova/mixtral-offloading/src/expert_cache.py", line 68, in <listcomp>
torch.UntypedStorage(self.module_size).pin_memory(self.device) for _ in range(offload_size)]
File "/home/mrnova/.conda/envs/mixtral/lib/python3.10/site-packages/torch/storage.py", line 226, in pin_memory
cast(Storage, self)).pin_memory(device)
RuntimeError: CUDA error: out of memory
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Trying to run this in Win10 WSL2 on a 3080TI /w 12gb VRAM. Setting the offload_per_layer=7 does not seem to help, VRAM memory usage never goes above 6.5gb so there seems to be lots of room available.