This PR introduces the PageRank table function, similar to WCC and LCC.
from pagerank(pg, node_table, edge_table)
Will return a table with the ID and PageRank of the ID. It uses the CSR data structure, similar to the other graph algorithms and iteratively calculates the PageRank. The current implementation is mostly single threaded. The first vector to execute the function will obtain a lock and run until converged. The next vectors do not have to perform any calculation but only get the rank for the ID. If this turns out to be bottleneck in the future, we might need to look into a parallel solution, but for now this suffices.
The default parameters are:
Damping factor: 0.85
Tolerance: 1e-6
Initial rank: 1/number of vertices
A future PR might add the option to allow users to change these options either through a SET or as arguments for the table function pagerank(pg, node, edge, damping_factor, tolerance)
There are slight discrepancies when comparing the output to NetworkX and Neo4j, but they are generally close. For example SF1 between NetworkX and DuckPGQ:
Mean Absolute Error (MAE): 7.867113585890136e-05
Mean Squared Error (MSE): 5.77749701647327e-08
Root Mean Squared Error (RMSE): 0.00024036424477183102
Maximum Absolute Error: 0.008648239988167099
Fixes #143
This PR introduces the PageRank table function, similar to WCC and LCC.
Will return a table with the ID and PageRank of the ID. It uses the CSR data structure, similar to the other graph algorithms and iteratively calculates the PageRank. The current implementation is mostly single threaded. The first vector to execute the function will obtain a lock and run until converged. The next vectors do not have to perform any calculation but only get the rank for the ID. If this turns out to be bottleneck in the future, we might need to look into a parallel solution, but for now this suffices.
The default parameters are: Damping factor: 0.85 Tolerance: 1e-6 Initial rank:
1/number of vertices
A future PR might add the option to allow users to change these options either through aSET
or as arguments for the table functionpagerank(pg, node, edge, damping_factor, tolerance)
There are slight discrepancies when comparing the output to NetworkX and Neo4j, but they are generally close. For example SF1 between NetworkX and DuckPGQ: