Multilingual evaluation benchmarks

bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.

Apache License 2.0

832 stars 220 forks source link

Closed loubnabnl closed 2 years ago

loubnabnl commented 2 years ago

Added:

Code generation (few-shot generation with BLEU evaluation): Concode (Java ), Spider (SQL), CoNaLa(Python)
Code summarization (few-shot generation with BLEU evaluation): code-to-text benchmark from CodeXGLUE (Java, JavaScript, PHP,Python, Go and Ruby)
Classification tasks (fine-tuning): complexity prediction (Java), clone detection (Java), defect detection(C)