ASSERT-KTH / drr

Tool & data on the correctness of Defects4 patches generated by program repair tools http://arxiv.org/pdf/1909.13694
http://arxiv.org/pdf/1909.13694
Creative Commons Attribution Share Alike 4.0 International
10 stars 6 forks source link

Automated Patch Assessment for Program Repair

A tool for automatic correctness assessment for patches generated by program repair systems. We consider the human patch as ground truth oracle and use Random tests based on the Ground Truth (RGT). See Automated Patch Assessment for Program Repair at Scale

If you use this repo, please cite:

@Article{Ye2021EMSE,
    author = {Ye, He and Martinez, Matias and Monperrus, Martin},
    title = "Automated Patch Assessment for Program Repair at Scale",
    journal="Empirical Software Engineering",
    volume = "26",
    issn = "1573-7616",
    doi = "https://doi.org/10.1007/s10664-020-09920-w",
    year = "2021"
}

Folder Structure

├── Patches 257 patches from Dcorrect and 381 patches from Doverfitting
│ 
├── RGT: incl. tests from Evosuite2019, Randoop2019, EvosuitASE15, RandoopASE15 and EvosuiteEMSE18
│   
├── DiffTGen
│   ├── Results: the running result overfitting patches found by DiffTGen. 
│   ├── runDrr.py: a command to reproduce DiffTGen experiment(details see below)
│ 
├── statistics: our exerimental statistics for all RQs
│ 
└──  run.py: a command to reproduce all experiments

Prerequisites

git submodule add https://github.com/rjust/defects4j
git reset --hard 486e2b49d806cdd3288a64ee3c10b3a25632e991

Run

To assess an indiviual patch for Defects4J:

./run.py patch_assessment <patch_id> <dataset:Dcorrect|Doverfitting> <RGT:ASE15_Evosuite|ASE15_Randoop|EMSE18_Evosuite|2019_Evosuite|2019_Randoop>  
example:  ./run.py patch_assessment patch1-Lang-35-ACS.patch Dcorrect 2019_Evosuite

To perform different sanity checks:

./run.py applicable_check
./run.py plausible_check

To identify flaky tests:

./run.py flaky_check <patch_id> <dataset:Dcorrect|Doverfitting> <RGT:ASE15_Evosuite|ASE15_Randoop|EMSE18_Evosuite|2019_Evosuite|2019_Randoop>  
example:  ./run.py flaky_check patch1-Lang-35-ACS.patch Dcorrect 2019_Evosuite

To reproduce our Expriments with RGT patch assessment

RQ1: ./run.py RQ1
RQ3: ./run.py RQ3
RQ4: ./run.py RQ4
RQ5: cd ./statistics   ./RQ5-randomness-script.py  <Evosuite2019|Randoop2019>

Results

Credits