bigscience-workshop / petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
https://petals.dev
MIT License
8.89k stars 490 forks source link

Latest bump "Bump transformers and accelerate versions (#554)" looks to destroy Falcon support. #565

Open STEMBytes opened 3 months ago

STEMBytes commented 3 months ago

Hello, I believe the Latest bump, "Bump transformers and accelerate versions (#554)," looks to destroy Falcon support. Falcon is an essential model, given its open license use (40 and lower). I have validated that if I use an older version of Petals, I can run Falcon but get errors with the current Petals build.

mryab commented 3 months ago

Hi, thanks for reporting the issue! Can you describe the exact error you observed?

STEMBytes commented 3 months ago

You bet here is the full error. I get this same message when trying to add to the 180B version off petals.dev. Here is part of the results but I removed PII info like my IP address in the full responce" [INFO] Model weights are loaded in bfloat16, quantized to nf4 format Mar 18 23:21:44.649 [INFO] Server will fill your GPU memory with 60 transformer blocks. If you want to leave some free GPU memory, please specify a lesser --num_blocks manually Mar 18 23:21:44.650 [INFO] Attention cache for all blocks will consume up to 1.88 GiB Mar 18 23:21:44.650 [INFO] Loading throughput info Mar 18 23:21:44.650 [INFO] Measuring network and compute throughput. This takes about a minute and will be cached for future runs Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "/home/ai/anaconda3/lib/python3.11/site-packages/petals/cli/run_server.py", line 235, in main() File "/home/ai/anaconda3/lib/python3.11/site-packages/petals/cli/run_server.py", line 219, in main server = Server( ^^^^^^^ File "/home/ai/anaconda3/lib/python3.11/site-packages/petals/server/server.py", line 237, in init throughput_info = get_server_throughput( ^^^^^^^^^^^^^^^^^^^^^^ File "/home/ai/anaconda3/lib/python3.11/site-packages/petals/server/throughput.py", line 82, in get_server_throughput cache[cache_key] = measure_throughput_info( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ai/anaconda3/lib/python3.11/site-packages/petals/server/throughput.py", line 122, in measure_throughput_info "inference_rps": measure_compute_rps( ^^^^^^^^^^^^^^^^^^^^ File "/home/ai/anaconda3/lib/python3.11/site-packages/petals/server/throughput.py", line 210, in measure_computerps , cache = block.forward(dummy_input, use_cache=True) # Skip the 1st step to exclude the initialization time ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ai/anaconda3/lib/python3.11/site-packages/tensor_parallel/tensor_parallel.py", line 99, in forward return [self.module_shards[0](*args, *kwargs)][self.output_device_index] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ai/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ai/anaconda3/lib/python3.11/site-packages/petals/models/falcon/block.py", line 421, in forward attention_mask = FalconModel._prepare_attn_mask(attention_mask, (batch_size, seq_length), past_length) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: type object 'FalconModel' has no attribute '_prepare_attn_mask'

I am using cuda "| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 ,,, and Name: torch Version: 2.0.1 on Ubuntu 22.04.

Let me know if you need any further details. If I revert to the prior version it loads fine. I have tried it on three machines with the same behavior.

justheuristic commented 3 months ago

@STEMBytes Thank you for writing a detailed report.

I'll investigate falcon support as soon as i'm done with the current sprint (eta next night). Will keep you posted here when I have more understanding on how to fix that mask issue, eta before this friday AOE.

justheuristic commented 2 months ago

I am sorry, I got fatally tangled in the ICML duties and they take longer than expected. I am still working my way through the todo list to eventually repair this. I will still get to fix falcon as soon as I can

STEMBytes commented 2 months ago

No worries at all. I understand that the ICML work went well. Thanks for the followup; and let me know how I can help.