Add forward_hpu to RotaryEmbedding, remove custom module

This PR removes the usage of custom HPU RotaryEmbedding modules, and adds a forward_hpu method to existing RotaryEmbedding, for reusing multiple derived implementations without the need of adding them to HPU extension. Mark_steps should not be needed within the test, but for whatever reason, if they are not there, PT bridge crashes. To be investigated later on. It does not affect actual model execution in any way I could test/observe.

HabanaAI / vllm-fork

Add forward_hpu to RotaryEmbedding, remove custom module #404