Update transformers library to make it work with new caching module required by llama models
Add granite_modeling_llama module that declares, defines and registers new transformers model type (gpt_megatron) into transformers causal lm registry thus allowing us to load sphinx / granite models with AutoModelForCausalLM and prompt tune those.
Integrate granite_modeling_llama script into causal-lm resource.
Fix bug to provide safe deletion of model config attribute
Fix bug to use torch_dtype at load time for prompt tuning local inferencing
Changes
granite_modeling_llama
module that declares, defines and registers new transformers model type (gpt_megatron
) into transformers causal lm registry thus allowing us to load sphinx / granite models withAutoModelForCausalLM
and prompt tune those.granite_modeling_llama
script into causal-lm resource.torch_dtype
at load time for prompt tuning local inferencing