Open young170 opened 2 weeks ago
Proposed update:
For the case of calculateTotalIncorrectVectors()
, removing any indices can be considered as an inaccurate representation of the differences between vectors. Including these indices will ensure the score reflects the entire sequence.
private int calculateTotalIncorrectVectors(int[] sigma) {
int incorrectVectorCount = 0;
for (int i = 0; i < sigma.length; i++) {
incorrectVectorCount += sigma[i];
}
return incorrectVectorCount;
}
For the heuristic weight in scoreSimilarity()
, using the ratio of incorrect vectors to the total number of vectors makes more sense. Instead of the original, which was the ratio of incorrect vectors to the difference in length of the vectors.
Also, instead of using the variable score
twice, created a separate weight variable.
public float scoreSimilarity(int[] poolVector, int[] givenBugVector) {
float score = 1;
float weight = 1;
int[] subSequence = backtrack(longestCommonSubsequenceOfIntegerArray(poolVector, givenBugVector));
// Calculate the count of incorrect vectors in the pool for each element in the DP table
int[] sigma = calculateSigma(poolVector, subSequence);
// Calculate the total count of incorrect vectors in the pool
int incorrectVectorCount = calculateTotalIncorrectVectors(sigma);
// Heuristic to give weight according to the incorrect vectors
weight = 1 - ((float) incorrectVectorCount / poolVector.length);
// Adjust the score based on the length of the DP table
score = weight * ((float) subSequence.length / givenBugVector.length);
return score;
}
This update fulfills the goal of lowering the score of cases where there exists mulitple incorrect vectors, and having a lesser effect on vectors with a few incorrect vectors. Also, when the vectors are identical, weight
is 1
which will ultimately give the vector a score of 1.0
.
Original code of
calculateTotalIncorrectVectors()
:i
starts at1
and ends at.length - 2
, removing the end indices from the process.Later on, when the LCS similarity score is calculated in
scoreSimilarity()
, a difference in one index(and no difference in length) gives the vector a score of0.0
.A single difference makes
incorrectVectorCount = 1
, which makes the equation:1 - 1 = 0
.