ELS-RD / transformer-deploy

Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
https://els-rd.github.io/transformer-deploy/
Apache License 2.0
1.64k stars 150 forks source link

add transformers modifications to boost performance on inference #103

Closed ayoub-louati closed 2 years ago

ayoub-louati commented 2 years ago

fix #96

ayoub-louati commented 2 years ago

@pommedeterresautee I think it is ready for review

pommedeterresautee commented 2 years ago

ran locally the code and got that error:

Start test without transformers modifications
batch-0: Process 8 samples in 3.60 seconds
batch-1: Process 8 samples in 3.58 seconds
batch-2: Process 8 samples in 2.72 seconds
batch-3: Process 8 samples in 2.73 seconds
batch-4: Process 8 samples in 2.21 seconds
Finish the processing of 40 samples with the speed 2.32 samples/second. Full time: 17.23916006088257 
Start test with transformers modifications
Traceback (most recent call last):
  File "/home/geantvert/workspace/fast_transformer/demo/generative-model/fastseq_test.py", line 219, in <module>
    modify_transformers = True
  File "/home/geantvert/workspace/fast_transformer/demo/generative-model/fastseq_test.py", line 140, in transformers_modifications_test
    # warmup model:
  File "/home/geantvert/workspace/fast_transformer/demo/generative-model/fastseq_test.py", line 46, in _generate
    # Generate Summary
  File "/home/geantvert/.local/share/virtualenvs/fast_transformer/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/geantvert/.local/share/virtualenvs/fast_transformer/lib/python3.9/site-packages/transformers/generation_utils.py", line 1354, in generate
    return self.beam_search(
  File "/home/geantvert/.local/share/virtualenvs/fast_transformer/lib/python3.9/site-packages/transformers/generation_utils.py", line 2277, in beam_search
    model_kwargs["past"] = self._reorder_cache(model_kwargs["past"], beam_idx)
  File "<string>", line 24, in updated_reorder_cache
AssertionError

Process finished with exit code 1

Any idea?

ayoub-louati commented 2 years ago

@pommedeterresautee I didn't get this error. I'll try to reproduce it. (are you sure you have the last version of this branch ? )

pommedeterresautee commented 2 years ago

Everything is up to date. Do you try with up to date Transformers (v.4.20.1)?

Don't know if it's related, but one thing we may want to add is an assert to check if the code has been correctly pathed. My understanding is that right now if the pattern is not found (because code has been updated) it will silently do nothing.

ayoub-louati commented 2 years ago

I'm testing with the last version: 4.20.1 and no error. This is the expected output:

Start test without transformers modifications
batch-0: Process 8 samples in 3.72 seconds
batch-1: Process 8 samples in 3.90 seconds
batch-2: Process 8 samples in 3.84 seconds
batch-3: Process 8 samples in 3.93 seconds
batch-4: Process 8 samples in 3.33 seconds
Finish the processing of 40 samples with the speed 1.69 samples/second. Full time: 23.705780506134033 
Start test with transformers modifications
batch-0: Process 8 samples in 2.47 seconds
batch-1: Process 8 samples in 2.48 seconds
batch-2: Process 8 samples in 2.55 seconds
batch-3: Process 8 samples in 2.65 seconds
batch-4: Process 8 samples in 2.23 seconds
Finish the processing of 40 samples with the speed 2.70 samples/second. Full time: 14.797889947891235 
pommedeterresautee commented 2 years ago

Ok, seems I had an issue with transformers on my side. Cleaning it made the code work as expected.

Start test without transformers modifications
batch-0: Process 8 samples in 2.98 seconds
batch-1: Process 8 samples in 3.09 seconds
batch-2: Process 8 samples in 3.23 seconds
batch-3: Process 8 samples in 3.24 seconds
batch-4: Process 8 samples in 2.81 seconds
Finish the processing of 40 samples with the speed 2.10 samples/second. Full time: 19.068033456802368 
Start test with transformers modifications
batch-0: Process 8 samples in 2.31 seconds
batch-1: Process 8 samples in 2.37 seconds
batch-2: Process 8 samples in 2.52 seconds
batch-3: Process 8 samples in 2.64 seconds
batch-4: Process 8 samples in 2.21 seconds
Finish the processing of 40 samples with the speed 2.78 samples/second. Full time: 14.371684551239014 

Can you add the assert thing, it's still a good idea :-) It may look something like that:

assert new_code in function_code, f"Failed to update function {function.__name__} in module {module_name}"
ayoub-louati commented 2 years ago

@pommedeterresautee : Assertion is added :100: