Closed federicoparroni closed 5 years ago
hi, the parameter k is the number of the nearest neighbors per row in the computation (in this case, for a simple dot product, the top k values are the highest k values per row computed in the matrices product)
if you need the standard dot product with all elements in the result you could just set k=urm.shape[0] (row length) but keep in mind that in this case you obtain as result dense rows with lot of zeros (so, depending on the size and density of the dataset, it could require a certain amount of memory, also you lose the advantage of using sparse matrices)
Yep, I was doing exactly how you suggested! So, in this case, do you think that I should use simply the dot function of csr matrices? Will it be faster?
Sure, you could use the dot product of scipy, about the performance question, I think the scipy function is a little bit faster because it doesn't need to check which of the top k values per row keep during the computation (because simply it keeps them all).
In general Similaripy functions are useful in those case in which you can't compute the full product/similarity matrix because it require too much space in memory (or because you need only the top k values).
Thank you so much!!
You are welcome :)
If you found my work useful, you could leave me a star, thanks :)
I'm a student at Polimi and your work help me so much for the Rec Sys course :)
Reading the example in the readme:
I have a doubt in the usage of the k param on the last row.. What is its sense? Can I use the dot product as the standard dot product between 2 matrices? Can you clarify this please?