Closed katie-lamb closed 7 months ago
I don't know if we should necessarily do this, it's just something that worried me a bit looking into getting Pandas 2.0 working (See #2394 / #2320). And all else being equal the fewer different systems we have doing one job across the project, the easier it'll be to maintain.
If there's a simple drop-in replacement that's great! If it's gonna be more work, or would be a relatively brittle setup, then maybe we should try and work around the dependency issues for the moment somehow.
The FERC1 to EIA matching module (pudl.analysis.ferc1_eia) uses the
recordlinkage
package to create feature vectors for comparison. Therecordlinkage
package hasn't had a release recently and it seems like it might be less maintained moving forward so it would be good to replace this dependency with something else.I think this feature creation could be replaced with functionality from
splink
orsklearn
or a combo. Thesplink
Comparisons library works best with asplink
linker that will then do a prediction, but it might work to then use thesklearn
Logistic Regression model that's currently implemented in theferc1_eia
module. It might just be easier to usesklearn
the whole way through.