Perhaps use hashed values instead of strings for the response structure (in ResponseFitnessClustering) to improve performance
Do not recompute clusters for each generation/individual but base fitness on the probability that a responseObject belongs to one of the existing clusters. Keep the AgglomerativeClustering object small. Recompute clusters after X generations.
Deal with values that change very often. I.e. if for X many generations, the agglomerative cluster object only contains clusters/individuals with super high dissimarity. Idea: create exclude lists so that this method is no longer used in individuals.
Issue with random values in response objects: if there is high degree of dissimilarity among individuals with this structure, this method should no longer be used for individuals.
Implement to not put a value belonging to a key in the feature vector when this key/value pair is also in the request
Upon further consideration, hashing the structure of the response structure is not necessary since the structures are stored in a HashMap.
Implemented a second version of clustering that calculates the probabilities of individuals belonging to a certain cluster rather than reclustering every generation. Reclustering is instead done every X generations in this version.
Key-Value pairs in the response that exist in the request are not put in the feature vector as they make the calculation of distance between responses less precise.
Following the meeting on the 15th of March: