We released and uploaded the model to Huggingface's model hub (StackOBERTflow-comments-small-v1)
The model can thus be used directly through the transformers
library:
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("giganticode/StackOBERTflow-comments-small-v1")
model = AutoModelForMaskedLM.from_pretrained("giganticode/StackOBERTflow-comments-small-v1")
Using pipeline
from transformers import pipeline
from pprint import pprint
COMMENT = "You really should not do it this way, I would use <mask> instead."
fill_mask = pipeline(
"fill-mask",
model="giganticode/StackOBERTflow-comments-small-v1",
tokenizer="giganticode/StackOBERTflow-comments-small-v1"
)
pprint(fill_mask(COMMENT))
# [{'score': 0.019997311756014824,
# 'sequence': '<s> You really should not do it this way, I would use jQuery instead.</s>',
# 'token': 1738},
# {'score': 0.01693696901202202,
# 'sequence': '<s> You really should not do it this way, I would use arrays instead.</s>',
# 'token': 2844},
# {'score': 0.013411642983555794,
# 'sequence': '<s> You really should not do it this way, I would use CSS instead.</s>',
# 'token': 2254},
# {'score': 0.013224546797573566,
# 'sequence': '<s> You really should not do it this way, I would use it instead.</s>',
# 'token': 300},
# {'score': 0.011984303593635559,
# 'sequence': '<s> You really should not do it this way, I would use classes instead.</s>',
# 'token': 1779}]
Download the zipped model from the release page and extract to model/path
.
The following models are available:
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("model/path")
model = AutoModelForMaskedLM.from_pretrained("model/path")
The datasets cover 9 tasks, briefly described in the following.
Classify the sentiment of Software Engineering artifact (e.g., Stack Overflow posts, app reviews, bug report comments etc.) Each example is to be classified as having either positive, negative or neutral sentiment.
We use the Senti4SD and SentiData datasets.
Example | Label |
---|---|
I want them to resize based on the length of the data they’re showing. | neutral |
When I run my client, it throws the following exception. | negatve |
This is always a really bad way to design software. | negative |
amazing! a must have app | positive |
Classify app reviews into various categories (e.g., rating, feature request, bug report etc.) or detect whether reviews are informative or not. We use the datasets from ARMINER, MAST and CLAP.
Example | Label |
---|---|
not able to download any pictures please fix these bugs immediately | informative |
Best game I’ve played on Android | rating |
good but... it has ads...please remove ads from this... | usability |
Detect whether a comment contains self-admitted technical debt (often indicated by FIXME
or TODO
).
We use the dataset by Maldonado et. al.
Example | Label |
---|---|
// FIXME: Is "No Namespace is Empty Namespace" really OK? | positive |
// Can return null to represent the bootstrap class loader. | negative |
Classify comments according to a pre-defined taxonomy (e.g., usage, license, deprecation, ownership). We rely on the dataset by Pascarella et. al.
Example | Label |
---|---|
// @return a string for throwing | usage |
// New button,purpose | summary |
// Caller of this method must hold that lock. | rationale |
Determine whether there is "coherence" between a given method and is corresponding lead comment, that is, whether the comment is describtive of the method. We use the dataset by Corazza et. al.
/**
* Returns the current number of milk units in
* the inventory.
* @return int
Code-Comment Coherence
*/
Prediction [25]
public int getMilk() {
return milk;
}
Label:: positive (coherent)
/**
* Check inventory user interface that processes input.
*/
public static void checkInventory() {
System.out.println(coffeeMaker.checkInventory());
mainMenu();
}
Label: negative (incoherent)
Detect linguistic smells in code, that is misleading identifier names or the violations of common naming conventions. Our work is based on the dataset by Fakhoury et. al.
public void ToSource(StringBuilder sb) {
sb.append(";");
this.NewLine(sb);
}
Label: smelly (transform method does not return)
Classify the algorithmic complexity of various algorithm implementations (e.g., O(1), O(n*log(n)) etc.). We use the dataset by Sikka et. al.
class GFG {
static int minJumps(int arr[], int n) {
int[] jumps = new int[n];
int min;
jumps[n - 1] = 0;
for (int i = n - 2; i >= 0; i--) {
if (arr[i] == 0)
jumps[i] = Integer.MAX_VALUE;
else if (arr[i] >= n - i - 1)
jumps[i] = 1;
else { ... }
}
return jumps[0];
}
public static void main(String[] args) {...}
}
Label: O(n log n)
Given a piece of code, classify it as either "readable" or "not readable". Our work relies on the dataset by Scalabrino et. al.
@Override
public void configure(Configuration cfg) {
super.configure(cfg);
cfg.setProperty(Environment.USE_SECOND_LEVEL_CACHE, ...);
cfg.setProperty(Environment.GENERATE_STATISTICS, ...);
cfg.setProperty(Environment.USE_QUERY_CACHE, "false" );
... // more cfg.setProperty calls
}
Label: readable
There is a module for each task/dataset. Currently, these are:
Module | Description | Dataset URL |
---|---|---|
ar_miner |
Informative app reviews | https://github.com/jinyyy666/AR_Miner/tree/master/datasets |
coherence |
Code-comment coherence | http://www2.unibas.it/gscanniello/coherence/ |
comment_classification |
Comment classification | https://zenodo.org/record/2628361 |
corcod |
Runtime complexity classification | https://github.com/midas-research/corcod-dataset |
readability |
Code readability classification | https://dibt.unimol.it/report/readability/ |
review_classification |
Review classification | https://mast.informatik.uni-hamburg.de/wp-content/uploads/2014/03/REJ_data.zip |
satd |
Self-admitted debt detection | https://github.com/maldonado/tse.satd.data |
senti4sd |
Sentiment analysis on Stack Overflow comments | https://github.com/collab-uniba/Senti4SD |
smell_detection |
Linguistic smell detection | https://github.com/Smfakhoury/SANER-2018-KeepItSimple- |
Some of the used datasets (e.g., CLAP) are not publicly avaiable.
Datasets were preprocessed and brought into a standard format. If you like to rerun one of the experiments, please contact
one of the authors for the dataset in the correct format. Datasets must be placed in /data/<module>/
.
Training parameters can be set in /dl4se/config/<module>
, dataset loading is handled in /dl4se/datasets/<module>
.
Additional configuration parameters can be passed on the command line. See the config.py
file of the corresponding module for a list
of possible command line options.
To run an experiment execute the following:
python -mdl4se.experiments.<module>.default --seeds 100 200 300 400 500 --out_file=result_file.csv