Open porduna opened 10 years ago
Does anybody think the algorithm should be something more complex (e.g., counting also authors, or assigning different values to the different tags)?
We can put the authors and tags in a set and compute the Jaccard index (http://en.wikipedia.org/wiki/Jaccard_index) between the papers. Its also easy to implement
If I understand it, the difference is that instead of doing:
same_tags = set_of_tags.intersection(current_tag_ids)
# This could be something more advance:
# One tag that appears twice might (or might not) have a higher value
# than one that appears 10 times
value = len(same_tags)
We do:
intersection = set_of_tags.intersection(current_tag_ids)
union = set_of_tags.union(current_tag_ids)
value = intersection / union
Is that right?
Yes, but to take also the authors into account we can add their IDs to the set (used_set = author_ids + tag_ids)
I'm thinking that maybe I'll implement a number of options and provide them as options with queries (e.g., publications/
We can do something similar to what we have done to the related persons in #78
It would be interesting to list 3-5 related publications, based on the tags.
A simple implementation might not be too difficult or inefficient. Basically, in 2 queries (assuming that we already have the list of tag_ids of the current publication):