Speculative Generation e2e

This PR is the final PR in a stack of PRs related to paged attention + speculative decoding:

[x] Paged Attention KVCache (https://github.com/foundation-model-stack/fms-extras/pull/8)
[x] Paged Model (https://github.com/foundation-model-stack/fms-extras/pull/9)
[ ] Speculative generation

Full implementation of the above can be found here: https://github.com/foundation-model-stack/fms-extras/pull/7

In this PR, we have added a speculative_generate function which performs speculative generation on the PagedLLaMA model using an MLPSpeculator. The scripts have also been updated to include a speculator_path in the case a user would like to perform speculative generate. Lastly, 2 functions were added to handle batch flattening/expansion and the attend function has been updated in the case the inputs have been flattened.

foundation-model-stack / fms-extras

Speculative Generation e2e #10