bigscience-workshop / petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
https://petals.dev
MIT License
8.9k stars 490 forks source link

Problem running petals on virtual CPU #421

Open tijszwinkels opened 11 months ago

tijszwinkels commented 11 months ago

I ran into this when trying to run: https://github.com/petals-infra/chat.petals.dev

$ flask  run --host=0.0.0.0 --port=5000
Floating point exception (core dumped)

But I believe this is an issue with the petals library itself. The following minimal example crashes as well:

from transformers import AutoTokenizer
from petals import AutoDistributedModelForCausalLM
import torch

model_name = "enoch/llama-65b-hf"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoDistributedModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float32)

running it:

$ python test.py
You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565
Jul 27 07:53:55.842 [INFO] Make sure you follow the LLaMA's terms of use: https://bit.ly/llama2-license for LLaMA 2, https://bit.ly/llama-license for LLaMA 1
Jul 27 07:53:55.842 [INFO] Using DHT prefix: llama-65b-hf
Floating point exception (core dumped)

It crashes on the last line. Please note it also crashes without the torch_dtype specification.

These are the capabilities of the virtualized CPU I'm running on:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 61
model name      : Intel Core Processor (Broadwell, IBRS)
stepping        : 2
microcode       : 0x1
cpu MHz         : 3408.010
cache size      : 4096 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap xsaveopt arat md_clear
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit srbds mmio_unknown
bogomips        : 6816.02
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:
mryab commented 11 months ago

Hi, thanks for reporting this! Can you try running some PyTorch code that is independent of Petals in your environment? For instance, any example from the transformers library: https://github.com/huggingface/transformers/tree/main/examples/pytorch

tijszwinkels commented 11 months ago

I opted for the multiple-choice one, runs without issue.

Screenshot 2023-07-27 at 10 41 23
emuchogu commented 11 months ago

torch_dtype=torch.float32

I'm facing the same error when running the following code:

from transformers import AutoTokenizer
from petals import AutoDistributedModelForCausalLM

model_name = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model_name)

INITIAL_PEERS = [
    "/ip4/192.168.100.250/tcp/31337/p2p/QmdCqmPMqgxFHqmMbbUxuU8Hm5KwoY9zRj5s5DbiyJoPbK",
]

model = AutoDistributedModelForCausalLM.from_pretrained(model_name, initial_peers=INITIAL_PEERS)

It fails on the last line with the following error:

Aug 03 22:22:04.246 [INFO] Make sure you follow the LLaMA's terms of use: https://bit.ly/llama2-license for LLaMA 2, https://bit.ly/llama-license for LLaMA 1
Aug 03 22:22:04.246 [INFO] Using DHT prefix: Llama-2-7b-chat-hf
Floating point exception (core dumped)

CPU details: lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 44 bits physical, 48 bits virtual CPU(s): 64 On-line CPU(s) list: 0-63 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 64 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 47 Model name: Intel(R) Xeon(R) CPU E7- 4870 @ 2.40GHz Stepping: 2 CPU MHz: 2394.006 BogoMIPS: 4788.01 Hypervisor vendor: Xen Virtualization type: full L1d cache: 2 MiB L1i cache: 2 MiB L2 cache: 16 MiB L3 cache: 1.9 GiB NUMA node0 CPU(s): 0-63

ddqspace-xyz commented 1 week ago

image

我也遇到同样的问题,排查发现应该是这里导致的core,临时措施可以先把这段代码注释掉。