cjerzak / LinkOrgs-software

LinkOrgs: An R package for linking linking records on organizations using half a billion open-collaborated records from LinkedIn
https://doi.org/10.1017/psrm.2024.55
MIT License
11 stars 1 forks source link

June 21, 2024 Improvements #3

Open cjerzak opened 5 months ago

cjerzak commented 5 months ago
  1. Build Hugging Face model, add supplementary functions to the package as needed.
  2. Consider revision to the AverageMatches process.
cjerzak commented 4 months ago

Revision idea to AverageMatches -> sample 100 rows of x, compare with all of y; sample 100 rows of y, compare with all of x

beniaminogreen commented 4 months ago

Hi Connor, here's a quick test function that I made based on the example we talked over today. It takes an input function as an argument (in this case, GetCalibratedDistThresh), and checks whether it can calculate an appropriate threshold to match a dataset to an infinitesimally-shifted copy of itself. Happy to revise this if we think we need a less-stringent test, or if we want to add more debug information to this function.

test_calibrate_threshold <- function(threshold_picking_function, n = 1000, p=250) {
    x <- matrix(rnorm(n*p),n,p)
    y <- x + matrix(rnorm(n*p,0,.0001),n,p)

    threshold <- threshold_picking_function(x=x,y=y,AveMatchNumberPerAlias=1)
    stopifnot(threshold < .5)
}

# Example
test_calibrate_threshold(GetCalibratedDistThres)

Best, Ben