catalyst-cooperative / ccai-entity-matching

An exploration of generalizable approaches to unsupervised entity matching for use in linking tabular public energy data sources.
MIT License
1 stars 2 forks source link

TF-IDF + Splink + Equal Weights #35

Closed zaneselvans closed 1 year ago

zaneselvans commented 1 year ago

Run the FERC1-EIA record linkage process using TF-IDF for string feature vectorization with naive equal weighting of features, and Splink to do the record linkage.

Parameters to vary

Evaluation criteria / outputs