Currently the default HF model is StabilityAI/stablelm-tuned-alpha-3b but maybe we can use microsoft/phi-1_5 which is half the size and reportedly has comparable performance to larger 7B models. Probably preferable to have a smaller model which is used by default.
Need to wait for them to add attention_mask to their generation function. Currently, we will get an error about attention_mask not being an argument to model.forward.
Currently the default HF model is StabilityAI/stablelm-tuned-alpha-3b but maybe we can use microsoft/phi-1_5 which is half the size and reportedly has comparable performance to larger 7B models. Probably preferable to have a smaller model which is used by default.
Need to wait for them to add
attention_mask
to their generation function. Currently, we will get an error aboutattention_mask
not being an argument tomodel.forward
.