aws-neuron / transformers-neuronx

Apache License 2.0
95 stars 28 forks source link

Issue while compiling Mistral 7B 0.2 Instruct #77

Closed josete89 closed 6 months ago

josete89 commented 8 months ago

I want to compile the latest instruct version of Mistral (mistralai/Mistral-7B-Instruct-v0.2) for serving in the inf2 instances.

Therefore I was following this tutorial: https://huggingface.co/aws-neuron/Mistral-neuron to compile the version then I got the following error:

  File "mistral-neuron.py", line 22, in <module>
    model_neuron.to_neuron()
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/base.py", line 67, in to_neuron
    self.load_weights()
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/mistral/model.py", line 120, in load_weights
    self.decoder_lm_head.to_neuron()
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/decoder.py", line 290, in to_neuron
    self.program = self._build_program()
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/decoder.py", line 423, in _build_program
    hlo_modules[npos,batch_size] = self._hlo_fully_unrolled(npos, batch_size)
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/decoder.py", line 473, in _hlo_fully_unrolled
    return compiler.compile_py_func(fully_unrolled)
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/compiler.py", line 48, in compile_py_func
    return HloScribe(serialize_torch)(py_func).module_proto
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch_neuronx/pyhlo/scribe.py", line 265, in __call__
    scribe.scribe(func)
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch_neuronx/pyhlo/scribe.py", line 277, in scribe
    root_shape = compute_def(self)
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/decoder.py", line 463, in fully_unrolled
    hidden, out_caches = self._hlo_layers(hidden, tensors, self.layers, layers_caches, layers_weights)
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/decoder.py", line 526, in _hlo_layers
    hidden, *out_caches = self.layer_builder(hidden, *tensors, *in_caches, *weights)
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/mistral/hlo.py", line 73, in layer
    attn_output, out_attn_k_cache, out_attn_v_cache = self.attention(
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/mistral/hlo.py", line 178, in attention
    if list(cached_keys.sizes)[0] > self.config.window_size and list(cached_values.sizes)[0] > self.config.window_size and list(mask.sizes)[2] >self.config.window_size:
TypeError: '>' not supported between instances of 'int' and 'NoneType'

I didn't used the huggingface ami, I used the normal Neuron DLAMI with ubutuntu 20.04. I make sure to install latest libraries transformers-neuronx @ git+https://github.com/aws-neuron/transformers-neuronx.git@8b01968f57ba29b8e3365e5a1424b74382ae8902

Can anyone help to identify the root cause?

awsilya commented 8 months ago

@josete89 this is a known issue with the fix that should be available in the next release. Also, Mistral 0.1 should not be affected, you could try it in the meantime.

jimburtoft commented 7 months ago

@josete89 It looks like it is similar to this issue: https://github.com/aws-neuron/transformers-neuronx/issues/71

You can test it by downloading locally and editing the sliding_window value in config.json to 4096 before you compile.

jimburtoft commented 7 months ago

Changing the sliding_window worked. See https://huggingface.co/aws-neuron/Mistral-7B-Instruct-v0.2-seqlen-2048-bs-1-cores-2

aws-donkrets commented 7 months ago

jimburtoft. Great to hear that worked for you. If there is no other issue can we close this ticket?

mrnikwaws commented 6 months ago

Since we haven't heard back for two weeks, closing this ticket