This PR adds a convenience function to find the top k items in a similarity list. Any user wanting to compute recommendations via musly_jukebox_similarity() needs this, so we shouldn't require every user to implement it on their own. Keeping this common functionality in a central place also enables us to optimize it in future (e.g., the "iterate over the data updating a heap" approach is not optimal for all configurations; if we need a large fraction of top items, then partially or fully sorting the input list (via std::partial_sort, std::nth_element or std::sort) is faster).
This PR adds a convenience function to find the top k items in a similarity list. Any user wanting to compute recommendations via
musly_jukebox_similarity()
needs this, so we shouldn't require every user to implement it on their own. Keeping this common functionality in a central place also enables us to optimize it in future (e.g., the "iterate over the data updating a heap" approach is not optimal for all configurations; if we need a large fraction of top items, then partially or fully sorting the input list (viastd::partial_sort
,std::nth_element
orstd::sort
) is faster).