PygmalionAI / aphrodite-engine

Large-scale LLM inference engine
https://aphrodite.pygmalion.chat
GNU Affero General Public License v3.0
1.01k stars 113 forks source link

[Bug]: Tekken tokenizer fails to load #765

Closed iamsuperdupercool closed 6 days ago

iamsuperdupercool commented 6 days ago

My current environment

PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.3 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: 14.0.0-1ubuntu1.1
CMake version: version 3.30.3
Libc version: glibc-2.35

Python version: 3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-6.1.85+-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 12.2.140
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: Tesla T4
Nvidia driver version: 535.104.05
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.6
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.6
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.6
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.6
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.6
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.6
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.6
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        46 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               2
On-line CPU(s) list:                  0,1
Vendor ID:                            GenuineIntel
Model name:                           Intel(R) Xeon(R) CPU @ 2.00GHz
CPU family:                           6
Model:                                85
Thread(s) per core:                   2
Core(s) per socket:                   1
Socket(s):                            1
Stepping:                             3
BogoMIPS:                             4000.42
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat md_clear arch_capabilities
Hypervisor vendor:                    KVM
Virtualization type:                  full
L1d cache:                            32 KiB (1 instance)
L1i cache:                            32 KiB (1 instance)
L2 cache:                             1 MiB (1 instance)
L3 cache:                             38.5 MiB (1 instance)
NUMA node(s):                         1
NUMA node0 CPU(s):                    0,1
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Mitigation; PTE Inversion
Vulnerability Mds:                    Vulnerable; SMT Host state unknown
Vulnerability Meltdown:               Vulnerable
Vulnerability Mmio stale data:        Vulnerable
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Vulnerable
Vulnerability Spec rstack overflow:   Not affected
Vulnerability Spec store bypass:      Vulnerable
Vulnerability Spectre v1:             Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
Vulnerability Spectre v2:             Vulnerable; IBPB: disabled; STIBP: disabled; PBRSB-eIBRS: Not affected; BHI: Vulnerable (Syscall hardening enabled)
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Vulnerable

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] optree==0.12.1
[pip3] pyzmq==24.0.1
[pip3] sentence-transformers==3.1.1
[pip3] torch==2.3.0
[pip3] torchaudio==2.4.1+cu121
[pip3] torchsummary==1.5.1
[pip3] torchvision==0.19.1+cu121
[pip3] transformers==4.40.0
[pip3] triton==2.3.0
[conda] Could not collect
ROCM Version: Could not collect
Neuron SDK Version: N/A
Aphrodite Version: 0.5.3
Aphrodite Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X  0-1     N/A     N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

How to reproduce

aphrodite run karrelin/Lumimaid-v0.2-12B-AWQ --dtype float16 --host 127.0.0.1 --gpu-memory-utilization 0.99 --max-model-len 4096 --max-log-len 0 --revision main -q awq

Traceback

Traceback (most recent call last):
  File "/usr/local/bin/aphrodite", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/endpoints/cli.py", line 25, in main
    args.func(args)
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/endpoints/openai/api_server.py", line 519, in run_server
    engine = AsyncAphrodite.from_engine_args(engine_args)
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 358, in from_engine_args
    engine = cls(engine_config.parallel_config.worker_use_ray,
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 323, in __init__
    self.engine = self._init_engine(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 429, in _init_engine
    return engine_class(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/aphrodite_engine.py", line 125, in __init__
    self._init_tokenizer()
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/aphrodite_engine.py", line 246, in _init_tokenizer
    self.tokenizer: BaseTokenizerGroup = get_tokenizer_group(
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/transformers_utils/tokenizer_group/__init__.py", line 20, in get_tokenizer_group
    return TokenizerGroup(**init_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/transformers_utils/tokenizer_group/tokenizer_group.py", line 23, in __init__
    self.tokenizer = get_tokenizer(self.tokenizer_id, **tokenizer_config)
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/transformers_utils/tokenizer.py", line 145, in get_tokenizer
    tokenizer = AutoTokenizer.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py", line 862, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2089, in from_pretrained
    return cls._from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2311, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py", line 111, in __init__
    fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
Exception: data did not match any variant of untagged enum ModelWrapper at line 1217962 column 3
AlpinDale commented 6 days ago

You seem to be on aphrodite 0.5.3, while the latest version is 0.6.2

Please upgrade to the latest and launch the model with --tokenizer-mode mistral

iamsuperdupercool commented 6 days ago

Okay 👍

iamsuperdupercool commented 6 days ago

BTW, https://downloads.pygmalion.chat/whl/ is outdated.

iamsuperdupercool commented 6 days ago

Running with --tokenizer-mode mistral gives me this error:

  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/endpoints/openai/rpc/server.py", line 209, in run_rpc_server
    server = AsyncEngineRPCServer(async_engine_args, rpc_path)
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/endpoints/openai/rpc/server.py", line 24, in __init__
    self.engine = AsyncAphrodite.from_engine_args(async_engine_args)
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 601, in from_engine_args
    engine = cls(
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 510, in __init__
    self.engine = self._init_engine(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 694, in _init_engine
    return engine_class(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/aphrodite_engine.py", line 239, in __init__
    self.tokenizer = self._init_tokenizer()
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/aphrodite_engine.py", line 455, in _init_tokenizer
    return init_tokenizer_from_configs(
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/transformers_utils/tokenizer_group/__init__.py", line 29, in init_tokenizer_from_configs
    return get_tokenizer_group(parallel_config.tokenizer_pool_config,
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/transformers_utils/tokenizer_group/__init__.py", line 50, in get_tokenizer_group
    return tokenizer_cls.from_config(tokenizer_pool_config, **init_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/transformers_utils/tokenizer_group/tokenizer_group.py", line 29, in from_config
    return cls(**init_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/transformers_utils/tokenizer_group/tokenizer_group.py", line 22, in __init__
    self.tokenizer = get_tokenizer(self.tokenizer_id, **tokenizer_config)
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/transformers_utils/tokenizer.py", line 112, in get_tokenizer
    tokenizer = MistralTokenizer.from_pretrained(str(tokenizer_name),
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/transformers_utils/tokenizers/mistral.py", line 77, in from_pretrained
    tokenizer_file = cls._download_mistral_tokenizer_from_hf(
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/transformers_utils/tokenizers/mistral.py", line 97, in _download_mistral_tokenizer_from_hf
    filename = find_tokenizer_file(files)
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/transformers_utils/tokenizers/mistral.py", line 36, in find_tokenizer_file
    raise OSError(f"Found {len(matched_files)} files matching the "
OSError: Found 0 files matching the pattern: {matched_files}. Make sure that a Mistral tokenizer is present in {tokenizer_name}.
AlpinDale commented 6 days ago

BTW, https://downloads.pygmalion.chat/whl/ is outdated.

That index url was only a fix for pypi wheel limits. It's been deprecated now and none of our guides references it. Please don't use it anymore.

As for the error, your model will need the tokenizer.model.v3 file from the original mistral repo. Finetunes won't typically contain this because axolotl doesn't copy it. You should grab it from the official mistral repo and put it in your model directory. Or you can do --tokenizer mistralai/Mistral-Large-Instruct-2407 if you're using that model.

iamsuperdupercool commented 6 days ago

Ok, thanks.