Feuermagier / autograder

Automatic grading of student's Java code
MIT License
13 stars 7 forks source link

Improve performance of duplicate code detection #574

Open Luro02 opened 1 month ago

Luro02 commented 1 month ago

Description

The check is currently one of the slowest checks (it takes up 13.4% of execution time, second slowest check).

The problematic code sections are:

  1. countStatements (to decide if a code segment is long enough)
  2. Building the occurrences Map (that is related to the implementation of StructuralEqualsVisitor/StructuralHashCodeVisitor).

Here are some ideas to improve performance:

  1. Skip adding more kinds of statements in the occurrences map like CtComment
  2. Only add multiple statements to the occurrences map like CtStatementList, because a single statement would not be enough for a duplicate? What is with things like if (a) b;?
  3. Implement #573 which should speed up the isRefactorable method of StructuralEqualsVisitor. It might make sense to build up some cache like with UsesFinder for whether variables are constant/final (I think the whole project would benefit from this).
  4. For StructuralHashCodeVisitor the slowest code is the one that calls StructuralEqualsVisitor#shouldSkip, which should become faster with 3