Open ijabz opened 8 years ago
Sure. You can put a break point in CosineSimilarity.java at line 62.
Or if you want to log what goes in, the builder relies on interfaces rather then concrete implementations so you can wrap the metric in your own metric.
But I think you should write unit tests to validate if your SpecialReplacementsSimplifier works as it should rather then visual inspection.
MultisetMetric<String> loggingMetric = new MultisetMetric<String>() {
final CosineSimilarity<String> cos = new CosineSimilarity<>();
@Override
public float compare(Multiset<String> a, Multiset<String> b) {
System.out.println("CosineSimilarity [");
System.out.println("a: " + a);
System.out.println("b: " + a);
System.out.println("]");
return cos.compare(a,b);
}
};
StringMetric metric = with(loggingMetric)
.simplify(Simplifiers.toLowerCase())
.simplify(Simplifiers.removeDiacritics())
.simplify(new SpecialReplacementsSimplifier())
.tokenize(Tokenizers.whitespace())
.build();
Thanks that works, but Ideally I would like it to output the two original strings well. Of course I can output these myself before making the compare call, but in a multithreaded system other calls may get interleaved. I wanted this to check my whole simmetrics stack, access to the tokenized sets (as you ve shown me above) is needed to write unit tests anyway
Then you shouldn't use the builder. Its design relies on being indifferent towards the individual components as long as they adhere to their interface.
If you say so, though it would seem quite useful to have a way of seeing the effects of a builder on some inputs without having to break down the individual steps.
What would you do with this information?
So may typically have@
What I would like to do for debugging is an easy way to see the final step before the cosine similarity, i,e the contents of the sets created by applying the simplifiers and then finally the tokenizer(s), is this possible ?