optimization_dict not improving performance

MKLab-ITI / pygrank

Recommendation algorithms for large graphs

Apache License 2.0

29 stars 4 forks source link

>>> import pygrank as pg >>> optimization_dict = dict() >>> pg.benchmark_print(pg.benchmark({"HK": pg.HeatKernel(optimization_dict=optimization_dict)}, pg.load_datasets_all_communities(["bigraph"]), metric="time")) HK bigraph0 3.06 bigraph1 3.36 >>> pg.benchmark_print(pg.benchmark({"HK": pg.HeatKernel()}, pg.load_datasets_all_communities(["bigraph"]), metric="time")) HK bigraph0 2.98 bigraph1 2.96

Issue arose from creating a new personalization object in the GraphFilter.rank(...) method that had a different hash value than the original and hence could not be queries. This was fixed with a GraphFilter_prepare(personalization) method at the very beginning of ranking, which gives the opportunity to the ClosedFormGraphFilter to search for the dictionary entry with the true hash value.

Optimization improvements are tested only on the numpy backend, because in other backends running times are dominated by switching back-and-forth with numpy during supervised measure evaluation.

When fixing this issue, it was noted that GraphSignal.filter often dominates running time due to employing list comprehension instead of native backend operations. Respective operations were added to all backends.

Minimum fixed version: 0.2.5 Related tests: tests.test_filters.test_optimization_dict

MKLab-ITI / pygrank

optimization_dict not improving performance #7