Readme Example not working (MemoryError: std::bad_alloc)

Hello,

I hope you are doing well.

Great repository! I just started looking into it, unfortunately I am unable to run this on my server (I do not have sudo access)

I am running the basic readme_example


(moe-infinity) ya255@abdelfattah-compute-02:~/projects/MoE-Infinity/examples$ python readme_example.py 
Do not detect pre-installed ops, use JIT mode
/home/ya255/.conda/envs/moe-infinity/lib/python3.9/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Fetching 10 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 219597.07it/s]
[WARNING] FlashAttention is not available in the current environment. Using default attention.
Using /home/ya255/.cache/torch_extensions/py39_cu121 as PyTorch extensions root...
Emitting ninja build file /home/ya255/.cache/torch_extensions/py39_cu121/prefetch/build.ninja...
Building extension module prefetch...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module prefetch...
Time to load prefetch op: 2.4551808834075928 seconds
SPDLOG_LEVEL : (null)
2024-06-24 16:21:10.003 INFO Create ArcherAioThread for thread: , 0
2024-06-24 16:21:10.003 INFO Loading index file from , /scratch/ya255/moe-infinity/archer_index
Traceback (most recent call last):
  File "/home/ya255/projects/MoE-Infinity/examples/readme_example.py", line 17, in <module>
    model = MoE(checkpoint, config)
  File "/home/ya255/projects/MoE-Infinity/moe_infinity/entrypoints/big_modeling.py", line 127, in __init__
    with self.engine.init(cls=model_cls, ar_config=config):
  File "/home/ya255/projects/MoE-Infinity/moe_infinity/runtime/model_offload.py", line 144, in init
    self.archer_engine = self.prefetch_lib.prefetch_handle(
MemoryError: std::bad_alloc

Is there something obvious I am missing? If it helps, I had to make the archer_index folder myself, and after this MemoryError, it is empty, so I am wondering if there is some sudo issue.

Thanks for the help!

TorchMoE / MoE-Infinity

Readme Example not working (MemoryError: std::bad_alloc) #26