How does it compare with CodeGen?

The CodeGen models are trained for a significantly longer time and are therefore more performant. Compare the HumanEval results in their paper to those in ours for a (slightly narrow, Python only) apples-to-apples comparison. In addition, they come in a wider range of sizes, up to ~7x the size of our largest model (16B), although the larger ones are rather slow to query. Their main downside is that they support a relatively smaller range of languages -- either 6 programming languages, for their multi-lingual models, or just Python for the final fine-tuned model. They have also not yet released their training code, so fine-tuning the model may be challenging.

VHellendoorn / Code-LMs

How does it compare with CodeGen? #28