Open chansooligans opened 1 year ago
Need to work on query for record linkage
"left" dataframe entities should only link to "right" dataframe entitites also debug to make sure test script run_rl.py works as expected
TRUNCATE TABLE {settings.db.db_schema}.clusters;
INSERT INTO {settings.db.db_schema}.clusters (cluster, _index, _type)
SELECT
component as cluster,
node * -1 as _index,
CASE
WHEN node >= 0 THEN True
ELSE False
END as _type
FROM pgr_connectedComponents(
'SELECT
ROW_NUMBER() OVER (ORDER BY _index_l,_index_r) as id,
_index_l as source,
-1*_index_r as target,
score as cost
FROM {settings.db.db_schema}.scores'
);
"""
Purpose of this PR is to abstract out the clustering logic and move the "get_connected_components" function into repository. For the postgres repository, the "get_connected_components" logic can be solved using the pgrouting library. This library extends postgres with network analysis tools.