Error running benchmarks/benchmark_generation.py

BlinkDL commented 1 year ago

Hi there. It's great to see another LM trained on the Pile.

When I run benchmarks/benchmark_generation.py:

[KeOps] Compiling cuda jit compiler engine ... OK
[pyKeOps] Compiling nvrtc binder for python ... OK
Number of parameters: 1326096384
[KeOps] Generating code for formula Sum_Reduction(ComplexMult(Var(0,2,1),ComplexExp(ComplexMult(Var(1,2,1),Var(2,2,0)))),0) ... OK
Segmentation fault

and it exits after "Segmentation fault".

So I uninstall pykeops, and then the new error is:

Traceback (most recent call last):
  File "/fsx/BlinkDL/CODE/_PUBLIC_/H3/benchmarks/benchmark_generation_h3.py", line 68, in <module>
    fn()
  File "/fsx/BlinkDL/CODE/_PUBLIC_/H3/benchmarks/benchmark_generation_h3.py", line 65, in <lambda>
    fn = lambda: model.generate(input_ids=input_ids, max_length=max_length,
  File "/fsx/BlinkDL/conda/lib/python3.9/site-packages/flash_attn-0.2.8-py3.9-linux-x86_64.egg/flash_attn/utils/generation.py", line 150, in generate
    output = decode(input_ids, self, max_length, top_k=top_k, top_p=top_p,
  File "/fsx/BlinkDL/conda/lib/python3.9/site-packages/flash_attn-0.2.8-py3.9-linux-x86_64.egg/flash_attn/utils/generation.py", line 107, in decode
    logits = model(input_ids, inference_params=inference_params).logits[:, -1]
  File "/fsx/BlinkDL/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/fsx/BlinkDL/CODE/_PUBLIC_/H3/src/models/ssm_seq.py", line 186, in forward
    hidden_states = self.backbone(input_ids, position_ids=position_ids,
  File "/fsx/BlinkDL/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/fsx/BlinkDL/CODE/_PUBLIC_/H3/src/models/ssm_seq.py", line 141, in forward
    hidden_states, residual = layer(hidden_states, residual, mixer_kwargs=mixer_kwargs)
  File "/fsx/BlinkDL/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/fsx/BlinkDL/conda/lib/python3.9/site-packages/flash_attn-0.2.8-py3.9-linux-x86_64.egg/flash_attn/modules/block.py", line 126, in forward
    hidden_states = self.mixer(hidden_states, **mixer_kwargs)
  File "/fsx/BlinkDL/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/fsx/BlinkDL/conda/lib/python3.9/site-packages/flash_attn-0.2.8-py3.9-linux-x86_64.egg/flash_attn/modules/mha.py", line 481, in forward
    kv = self._update_kv_cache(qkv[:, :, 1:], inference_params)
  File "/fsx/BlinkDL/conda/lib/python3.9/site-packages/flash_attn-0.2.8-py3.9-linux-x86_64.egg/flash_attn/modules/mha.py", line 419, in _update_kv_cache
    assert self.layer_idx is not None, 'Generation requires layer_idx in the constructor'
AssertionError: Generation requires layer_idx in the constructor

DanFu09 commented 1 year ago

@BlinkDL try with this fix now!

For the KeOps issue - can you share details about your environment? PyTorch, CUDA, and KeOps versions would all be helpful.

kashif commented 1 year ago

I believe the issue happens when the GPU runs out of mem for the larger benchmarks.... e.g. on my setup with pykeops-2.1.1 on Driver Version: 525.60.13 CUDA Version: 12.0 with torch 2.0.0a0+git81b5eff on a 24gb card it crashes:

Number of parameters: 1326096384

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007ffe4e0c556b in range_preprocess_from_device(int&, int, int, int, int**, int, int*&, int*&, int*&, int*&, int, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, int*) () from .cache/keops2.1.1/build/pykeops_nvrtc.cpython-310-x86_64-linux-gnu.so

but works if i do a smaller test:

Number of parameters: 12102144
[KeOps] Generating code for formula Sum_Reduction(ComplexMult(ComplexMult(Var(1,2,0),Var(0,2,1)),ComplexExp(ComplexMult(Var(2,2,0),Var(3,2,1)))),0) ... OK

hope that helps!

bryanhpchiang commented 1 year ago

getting this same error -- also on a 24gb card. i see the --ckpt option but is there some way to toggle the model architecture between the different model sizes (ex. 1.3B vs. 2.7B)

thanks!

DanFu09 commented 1 year ago

There are examples to switch between the different models for text generation: https://github.com/HazyResearch/H3/tree/main/examples

HazyResearch / H3

Error running benchmarks/benchmark_generation.py #3