catalyst-cooperative / mozilla-sec-eia

Exploratory development for SEC to EIA linkage
MIT License
0 stars 0 forks source link

Record Linkage Improvements and Next Steps #104

Open katie-lamb opened 3 weeks ago

katie-lamb commented 3 weeks ago

Overview

Now that we've finished a first pass at record linkage, these are a list of improvements to try and next steps for model development.

Success Criteria

How will we know that we're done?

### Next steps
- [x] Assign a sec_company_id to all companies in SEC basic info and Ex. 21
- [x] Make validation data
- [x] Run the models with validation data to benchmark performance
- [x] Add in the Ex. 21 subsidiaries to the SEC side and perform the SEC to EIA match
- [x] Create a method for clustering duplicate company records
- [x] Don't block on report year?

SEC to EIA model improvement ideas