datamade / bga-payroll

💰 How much do your public officials make?
4 stars 4 forks source link

Year-to-year link #467

Open deraj1013 opened 3 years ago

deraj1013 commented 3 years ago

There are instances where what seems like an easy record link doesn't work for an employer. I don't quite understand the exact mechanism for linking the records. The most common issue I see is a difference in names (like John A Smith in 2018 and John A. Smith in 2017). But there are other ones that feel like they should link but don't.

Here are a few examples:

https://bga-payroll.datamade.us/unit/round-lake-beach-fc7d0cd0/?data_year=2018

https://bga-payroll.datamade.us/unit/fox-lake-fpd-1e193b34/?data_year=2018

https://bga-payroll.datamade.us/unit/calumet-township-f938daf7/?data_year=2018

https://bga-payroll.datamade.us/unit/lemont-e00060b0/?data_year=2018

https://bga-payroll.datamade.us/unit/macomb-1f40215c/?data_year=2018

https://bga-payroll.datamade.us/unit/manhattan-elwood-library-district-dbed1df7/?data_year=2018

https://bga-payroll.datamade.us/unit/markham-park-district-dd81cb48/?data_year=2018

https://bga-payroll.datamade.us/unit/oak-park-township-f8d7d815/?data_year=2018

Some of these are clear why they might not match (the aforementioned period on the middle initial, department changes or other assorted differences). Others are nearly identical but don't link.

I don't think this is a huge issue, but I might like to improve it where we can, and the more I know about what's under the hood, the better.

hancush commented 3 years ago

@deraj1013 The short answer is, if person names aren't identical, they won't match. That includes spelling, of course, but also punctuation and (new this year) capitalization.

The higher level logic is that people are linked if the name matches exactly and they are the only person in their unit and department with their name. This is because, if there are two Joe Smiths, we don't have enough information to say which is which year over year.

My two cents: Since you all review records pretty extensively, I think that a mechanism to link person records manually via the admin interface would be a more economical and reliable investment than trying to automate linkage, given the information we have to work with.

deraj1013 commented 3 years ago

That sounds reasonable. That might take a bit to sift through, but it might be a way to get these together. For now, people can still find the year to year salaries of John A(.) Smith, they just have to get to the page individually. Thanks for the explainer!