μBERT generates mutants based on CodeBERT predictions. This repo contains the implementation and replication package.
available at: https://doi.org/10.48550/arXiv.2208.06042
@article{khanfir2022mbert,
title={Efficient Mutation Testing via Pre-Trained Language Models},
author={Khanfir, Ahmed and Renzo, Degiovanni and Papadakis, Mike and Traon, Yves Le},
journal={arXiv preprint arXiv:2301.03543},
year={2023}
}
We have implemented our approach as modules in different repositories. If you just want to generate mutants using μBERT, you can skip these details and pass to the next subsection. Otherwise, here's a quick summary of how we implemented our approach:
Repo (java): https://github.com/Ahmedfir/java-business-locations.git
In this step we parse the input java classes and extract the main business-logic nodes to mutate. You can either clone and build the code yourself or use our released standalone jar directly. In https://github.com/Ahmedfir/CodeBERT-nt we incorporate the jar and call it directly from the python side, to extract the tokens.
Repo (python): https://github.com/Ahmedfir/cbnt
This repo contains the core implementation of our approach.
It provides APIs to mask tokens, invoke CodeBERT to predict alternative replacements for them and process them,
i.e. putting them in place in the original program for compilation and test,
or computing their cosine-embeddings similarity with the original version.
It has been first developed to provide APIs for the code-naturalness study,
and we continued extending it for this project' needs.
Repo (python): https://github.com/Ahmedfir/CodeBERT-nt
This repo contains the code base and evaluation material used to study the code-naturalness via CodeBERT. It invokes the two previous components. We incorporate it in our approach and continued adapting it provide required APIs for mutation.
Repo (java): https://github.com/Ahmedfir/mbert-additive-patterns.git
This repo contains the source-code responsible of proposing new alternative masked conditions,
to the ones originally provided in the input program.
Our approach then invokes CodeBERT to predict the masked tokens of these proposed new conditions.
You can either clone and build the code yourself or use our released standalone jar directly,
i.e. it's available under mbertntcall/mBERT-addconditions
in this same repo.
eval
repo contains our code to run μBERT and PiTest on defects4j bugs.requirements.txt
. If you decide to use pip, just call env_setup.sh
.Dependencies: You will have to clone some repos or call setup.sh
and it will be done.
It depends on https://github.com/Ahmedfir/cbnt
, https://github.com/Ahmedfir/CodeBERT-nt
and https://github.com/Ahmedfir/commons
implementations.
So you will have to include them in your $PYTHONPATH
i.e.:
if you want to use PyCharm:
go to Preferences
> Project:mBERT-mt
> Project structure
> +
> path/to/cloned/cbnt
.
Then similarly for commons
: > +
> path/to/cloned/commons
and for CodeBERT-nt
: > +
> path/to/cloned/CodeBERT-nt
.
if you just want to run the tool via shell (see the gen_mutants.sh
script):
you need to add the dependencies to your $PYTHONPATH
: export path/to/cloned/commons:path/to/cloned/cbnt:path/to/cloned/CodeBERT-nt:$PYTHONPATH
mavenrunner/mvn_mbert_runner.py
script.
The minimum required arguments are the -repo_path
or the -git_url
of your target project and the path to the config file via -config
.mavenrunner/mbert_config.yml
, you will need to adapt it (or create your own) depending on your environment and requirements.get_args()
method in mavenrunner/mvn_mbert_runner.py
.mbertntcall/mbert_generate_mutants_runner.py
script.
The minimum required arguments are the project path and the target classes to mutate.
i.e. python3 mbert_generate_mutants_runner.py -repo_path path/to/your/project -target_classes path/to/class1,path/to/class2
.
Please check the get_args()
method for more information on other optional parameters, i.e.
to get simple replacement mutants only (similar to μBERT ones: https://github.com/rdegiovanni/mBERT), you can pass -simple_only True
as param.gen_mutants.sh
.
We set it up to generate mutants for a class: DummyClass.java
available under test
folder.
You can adapt the script to your needs.mbertnteval/d4jeval/mbert
.MbertProject
under mbert_project.py
accordingly. Particularly:
compile_comand(self)
on_has_compiled(self, compilation_output)
test_comand(self)
on_tests_run(self, test_exec_output)
D4jProject
) under eval
mbertntcall.mbert_ext_request_impl.MbertRequestImpl.__init__
with this project as param. Then calling this request, same as
in method mbertntcall.mbert_generate_mutants_runner.create_mbert_request
in the class mbertntcall/mbert_generate_mutants_runner.py
.mbertnteval
mbertnteval/d4jeval/exec_pid_bid.sh
to run either our tool or the used baselines.
You may need to adapt or provide your own config files instead of the *_config.yml
ones provided under mbertnteval/d4jeval/mbert
and mbertnteval/d4jeval/pit
.
Here are example commands calling the mbertnteval/d4jeval/exec_pid_bid.sh
script, to generate mutants for Cli 13, using:
./exec_pid_bid.sh ~/PycharmProjects/mBERTa/mbertnteval/d4jeval/mbert/d4j_process_pid_bid.py Cli_13.src.patch.csv ~/PycharmProjects/mBERTa/mbertnteval/d4jeval/mbert/mbert_config.yml
./exec_pid_bid.sh ~/PycharmProjects/mBERTa/mbertnteval/d4jeval/pit/d4j_process_pid_bid.py Cli_13.src.patch.csv ~/PycharmProjects/mBERTa/mbertnteval/d4jeval/pit/pit_config.yml
./exec_pid_bid.sh ~/PycharmProjects/mBERTa/mbertnteval/d4jeval/pit/d4j_process_pid_bid.py Cli_13.src.patch.csv ~/PycharmProjects/mBERTa/mbertnteval/d4jeval/pit/pit_rv_config.yml