FUNGUS is a tool for detecting similarities between ARMv7 assembly projects, for example, for introductory software assignments. This is the command-line tool which performs the analysis and generates a plagiarism report in JSON format. It is meant to be used in conjunction with a desktop GUI, such as fungus-gui.
FUNGUS is inspired by Stanford's Measure of Software Similarity (Moss). At its core, it uses the same algorithm, winnowing, described in this paper.
PATH
.Run fungus --version
to check that the installation was successful.
cargo build --release
. The binary will be placed in the target/release/
directory.FUNGUS assumes the projects to analyze are all in separate directories, each a direct child of the same root directory. For example, consider the following directory structure:
submissions/
├── project1
│ ├── subdir1
│ │ └── file1.s
│ └── subdir2
│ └── file2.s
├── project2
│ ├── code1.s
│ └── code2.s
└── starter-code
├── file1.s
└── file2.s
If the submissions/
directory is selected as the root, then FUNGUS will select project1
, project2
, and starter-code
as the projects to compare.
Paths to ignore (e.g., assignment starter code provided to all students) can be given as input to FUNGUS. Any code in students' projects that match this code will not be flagged as potential plagiarism. The paths to ignore can be inside the root directory (as in the example above) or outside of it.
Two tokenizers are available:
r10
).FUNGUS accepts noise and guarantee thresholds as inputs.
In addition, when using the "relative" tokenizer, an additional max token offset can be specified. This is the maximum value of the distance for relative tokens. Intuitively, choosing a very small max offset will probably result in many false positives. In the extreme case of the max offset being 0, this reduces to non-relative lexing but with no distinction between registers, labels, etc. Conversely, choosing a very large max offset will probably result in many false negatives. In the extreme case of there being no limit, the results depend on the overall structure of the document. In that case, there is no guarantee that any matches will be reported (unless two files are identical).
{
"warnings": [
{
"file": "project1/my_invalid_file.s",
"message": "Message explaining what's wrong.",
"warn_type": "Type"
}
],
"project_pairs": [
{
"project1": "Project 1",
"project2": "Project 2",
"matches": [
{
"project_1_location": {
"file": "Project 1/code.s",
"span": {
"start": 0,
"end": 42
}
},
"project_2_location": {
"file": "Project 2/my_code.s",
"span": {
"start": 100,
"end": 150
}
}
}
]
}
]
}
Note that:
warnings
field:
warn_type
include "Args," "Input," and "Fingerprint." See the WarningType
enum for the full list.project_pairs
field:
root
argument.span
: