MKLab-ITI / pygrank

Recommendation algorithms for large graphs
Apache License 2.0
29 stars 4 forks source link

optimization_dict not improving performance #7

Closed maniospas closed 2 years ago

maniospas commented 2 years ago

The optimization_dict argument to the ClosedFormGraphFilter class does not seem to produce as an improvement in runnng time. This could indicate either a bug or bottlenecks in other parts of the pipeline, e.g. in graph signal instantiation.

Version: run with version 2.3 adjusted to run experiments 50 times when measuring time

Demonstration:

>>> import pygrank as pg
>>> optimization_dict = dict()
>>> pg.benchmark_print(pg.benchmark({"HK": pg.HeatKernel(optimization_dict=optimization_dict)}, pg.load_datasets_all_communities(["bigraph"]), metric="time"))
                 HK 
bigraph0         3.06
bigraph1         3.36
>>> pg.benchmark_print(pg.benchmark({"HK": pg.HeatKernel()}, pg.load_datasets_all_communities(["bigraph"]), metric="time"))
                 HK 
bigraph0         2.98
bigraph1         2.96

Related tests: None

maniospas commented 2 years ago

Issue arose from creating a new personalization object in the GraphFilter.rank(...) method that had a different hash value than the original and hence could not be queries. This was fixed with a GraphFilter_prepare(personalization) method at the very beginning of ranking, which gives the opportunity to the ClosedFormGraphFilter to search for the dictionary entry with the true hash value.

Optimization improvements are tested only on the numpy backend, because in other backends running times are dominated by switching back-and-forth with numpy during supervised measure evaluation.

When fixing this issue, it was noted that GraphSignal.filter often dominates running time due to employing list comprehension instead of native backend operations. Respective operations were added to all backends.

Minimum fixed version: 0.2.5 Related tests: tests.test_filters.test_optimization_dict