aws-neuron / transformers-neuronx

Apache License 2.0
102 stars 29 forks source link

Loading compiled fails: `model_type=bert -> transformers` being used in compiled config. #102

Open michaelfeil opened 1 day ago

michaelfeil commented 1 day ago

I am running the following code inside the following container (build by huggingface-optimum team)

763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference-neuronx:2.1.2-transformers4.43.2-neuronx-py310-sdk2.20.0-ubuntu20.04
import torch
from optimum.neuron import NeuronModelForFeatureExtraction  # type: ignore
from transformers import AutoConfig, AutoTokenizer  # type: ignore[import-untyped]

compiler_args = {"num_cores": get_nc_count(), "auto_cast_type": "fp16"}
input_shapes = {
            "batch_size": 4,
            "sequence_length": (
                self.config.max_position_embeddings
                if hasattr(self.config, "max_position_embeddings")
                else 512
            ),
        }
self.model = NeuronModelForFeatureExtraction.from_pretrained(
            model_id="TaylorAI/bge-micro-v2", # BERT SMALL
            revision=None,
            trust_remote_code=True,
            export=True,
            **compiler_args,
            **input_shapes,
        )

Leads to the following error:

INFO     2024-12-02 08:21:07,125 sentence_transformers.SentenceTransformer INFO: Load pretrained SentenceTransformer: TaylorAI/bge-micro-v2                                                                                                                     SentenceTransformer.py:218
***** Compiling bge-micro-v2 *****
.
Compiler status PASS
[Compilation Time] 24.19 seconds.
[Total compilation Time] 24.19 seconds.
2024-12-02 08:21:34.000152:  620  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-12-02 08:21:34.000154:  620  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
Model cached in: /var/tmp/neuron-compile-cache/neuronxcc-2.14.227.0+2d4f85be/MODULE_4aeca57e8a4997651e84.
ERROR:    Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 693, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/usr/local/lib/python3.10/site-packages/infinity_emb/infinity_server.py", line 96, in lifespan
    app.engine_array = AsyncEngineArray.from_args(engine_args_list)  # type: ignore
  File "/usr/local/lib/python3.10/site-packages/infinity_emb/engine.py", line 291, in from_args
    return cls(engines=tuple(engines))
  File "/usr/local/lib/python3.10/site-packages/infinity_emb/engine.py", line 70, in from_args
    engine = cls(**engine_args.to_dict(), _show_deprecation_warning=False)
  File "/usr/local/lib/python3.10/site-packages/infinity_emb/engine.py", line 55, in __init__
    self._model_replicas, self._min_inference_t, self._max_inference_t = select_model(
  File "/usr/local/lib/python3.10/site-packages/infinity_emb/inference/select_model.py", line 81, in select_model
    loaded_engine = unloaded_engine.value(engine_args=engine_args_copy)
  File "/usr/local/lib/python3.10/site-packages/infinity_emb/transformer/embedder/neuron.py", line 109, in __init__
    self.model = NeuronModelForFeatureExtraction.from_pretrained(
  File "/usr/local/lib/python3.10/site-packages/optimum/modeling_base.py", line 402, in from_pretrained
    return from_pretrained_method(
  File "/usr/local/lib/python3.10/site-packages/optimum/neuron/modeling_traced.py", line 242, in _from_transformers
    return cls._export(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/optimum/neuron/modeling_traced.py", line 370, in _export
    return cls._from_pretrained(save_dir_path, config, model_save_dir=save_dir)
  File "/usr/local/lib/python3.10/site-packages/optimum/neuron/modeling_traced.py", line 201, in _from_pretrained
    neuron_config = cls._neuron_config_init(config) if neuron_config is None else neuron_config
  File "/usr/local/lib/python3.10/site-packages/optimum/neuron/modeling_traced.py", line 468, in _neuron_config_init
    neuron_config_constructor = TasksManager.get_exporter_config_constructor(
  File "/usr/local/lib/python3.10/site-packages/optimum/exporters/tasks.py", line 2033, in get_exporter_config_constructor
    model_tasks = TasksManager.get_supported_tasks_for_model_type(
  File "/usr/local/lib/python3.10/site-packages/optimum/exporters/tasks.py", line 1245, in get_supported_tasks_for_model_type
    raise KeyError(
KeyError: "transformer is not supported yet for transformers. Only ['audio-spectrogram-transformer', 'albert', 'bart', 'beit', 'bert', 'blenderbot', 'blenderbot-small', 'bloom', 'camembert', 'clip', 'codegen', 'convbert', 'convnext', 'convnextv2', 'cvt', 'data2vec-text', 'data2vec-vision', 'data2vec-audio', 'deberta', 'deberta-v2', 'deit', 'detr', 'distilbert', 'donut', 'donut-swin', 'dpt', 'electra', 'encoder-decoder', 'esm', 'falcon', 'flaubert', 'gemma', 'glpn', 'gpt2', 'gpt-bigcode', 'gptj', 'gpt-neo', 'gpt-neox', 'groupvit', 'hubert', 'ibert', 'imagegpt', 'layoutlm', 'layoutlmv3', 'lilt', 'levit', 'longt5', 'marian', 'markuplm', 'mbart', 'mistral', 'mobilebert', 'mobilevit', 'mobilenet-v1', 'mobilenet-v2', 'mpnet', 'mpt', 'mt5', 'musicgen', 'm2m-100', 'nystromformer', 'owlv2', 'owlvit', 'opt', 'qwen2', 'llama', 'pegasus', 'perceiver', 'phi', 'phi3', 'pix2struct', 'poolformer', 'regnet', 'resnet', 'roberta', 'roformer', 'sam', 'segformer', 'sew', 'sew-d', 'speech-to-text', 'speecht5', 'splinter', 'squeezebert', 'swin', 'swin2sr', 't5', 'table-transformer', 'trocr', 'unispeech', 'unispeech-sat', 'vision-encoder-decoder', 'vit', 'vits', 'wavlm', 'wav2vec2', 'wav2vec2-conformer', 'whisper', 'xlm', 'xlm-roberta', 'yolos', 't5-encoder', 't5-decoder', 'mixtral'] are supported for the library transformers. If you want to support transformer please propose a PR or open up an issue."

Analysis:

Reproduction: docker run -it --device /dev/neuron0 michaelf34/aws-neuron-base-img:inf-repro

root@c2fd099ea82b:/app# nano /var/tmp/neuron-compile-cache/neuronxcc-2.14.227.0+2d4f85be/MODULE_79d2cd5b82fe880e7bef/
config.json              model.neuron             special_tokens_map.json  tokenizer.json           tokenizer_config.json    vocab.txt  
# config.json
{
  "_name_or_path": "michaelfeil/bge-small-en-v1.5",
  "architectures": [
    "BertModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "export_model_type": "transformer",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 384,
  "id2label": {
    "0": "LABEL_0"
  },
  "initializer_range": 0.02,
  "intermediate_size": 1536,
  "label2id": {
    "LABEL_0": 0
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "neuron": {
    "auto_cast": null,
    "auto_cast_type": null,
    "compiler_type": "neuronx-cc",
    "compiler_version": "2.14.227.0+2d4f85be",
    "disable_fast_relayout": false,
    "dynamic_batch_size": false,
    "inline_weights_to_neff": true,
    "input_names": [
      "input_ids",
      "attention_mask"
    ],
    "model_type": "transformer",
    "optlevel": "2",
    "output_attentions": false,
    "output_hidden_states": false,
    "output_names": [
      "token_embeddings",
      "sentence_embedding"
    ],
    "static_batch_size": 4,
    "static_sequence_length": 512
  },
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "task": "feature-extraction",
  "torch_dtype": "float32",
  "torchscript": true,
  "transformers_version": "4.41.1",

Also fails with same command with:

accelerate-0.23.0 optimum-1.18.1 optimum-neuron-0.0.22 tokenizers-0.15.2 transformers-4.36.2

Also fails with

optimum-1.23.* + optimum-neuron-0.0.26

Does not fail with same command with

optimum-1.17.1 + optimum-neuron-0.0.20
michaelfeil commented 1 day ago

Maybe better location for issue: https://github.com/huggingface/optimum-neuron/issues/744