Ahmedfir / mBERTa

CodeBERT based mutation testing tool.
Apache License 2.0
11 stars 3 forks source link

Efficient Mutation Testing via Pre-Trained Language Models

μBERT generates mutants based on CodeBERT predictions. This repo contains the implementation and replication package.

μBERT workflow

available at: https://doi.org/10.48550/arXiv.2208.06042

@article{khanfir2022mbert,
  title={Efficient Mutation Testing via Pre-Trained Language Models},
  author={Khanfir, Ahmed and Renzo, Degiovanni and Papadakis, Mike and Traon, Yves Le},
  journal={arXiv preprint arXiv:2301.03543},
  year={2023}
}

We have implemented our approach as modules in different repositories. If you just want to generate mutants using μBERT, you can skip these details and pass to the next subsection. Otherwise, here's a quick summary of how we implemented our approach:

AST parsing and location selection:

Repo (java): https://github.com/Ahmedfir/java-business-locations.git

In this step we parse the input java classes and extract the main business-logic nodes to mutate. You can either clone and build the code yourself or use our released standalone jar directly. In https://github.com/Ahmedfir/CodeBERT-nt we incorporate the jar and call it directly from the python side, to extract the tokens.

Masking and CodeBERT invocation:

Repo (python): https://github.com/Ahmedfir/cbnt

This repo contains the core implementation of our approach. It provides APIs to mask tokens, invoke CodeBERT to predict alternative replacements for them and process them, i.e. putting them in place in the original program for compilation and test, or computing their cosine-embeddings similarity with the original version.
It has been first developed to provide APIs for the code-naturalness study, and we continued extending it for this project' needs.

Repo (python): https://github.com/Ahmedfir/CodeBERT-nt

This repo contains the code base and evaluation material used to study the code-naturalness via CodeBERT. It invokes the two previous components. We incorporate it in our approach and continued adapting it provide required APIs for mutation.

Condition seeding:

Repo (java): https://github.com/Ahmedfir/mbert-additive-patterns.git

This repo contains the source-code responsible of proposing new alternative masked conditions, to the ones originally provided in the input program. Our approach then invokes CodeBERT to predict the masked tokens of these proposed new conditions. You can either clone and build the code yourself or use our released standalone jar directly, i.e. it's available under mbertntcall/mBERT-addconditions in this same repo.

Evaluation on Defects4J:

run μBERT:

pre-requirements:

Maven projects

mutants generation

Customisation

Evaluation: