Fix EIA boiler IDs that have inconsistenly reported leading zeros

This issue is analogous to the problem we found and addressed in #964 and #968, but applied to boiler IDs rather than generator IDs.

With the integration of the extensive EIA-860 boiler attributes in #2305 and #2364 we now have many more examples of boiler ID values, and some of them have inconsistently reported leading zeroes. Boiler IDs are strings, but many of them are purely numerical strings, and all this data goes through Excel, so this kind of thing doesn't get treated uniformly across all the years of data.

I found 392 records in the boilers_eia860 table with inconsistent leading zeros in their IDs across all years of data.
I checked that these inconsistencies ONLY affect boiler_id values which are entirely numeric (as is the case with generator_id)
The following snippet will show which "fixed" boiler_id values correspond to which original values, after fixing them with the same technique that we're using on the generator_id values:

bids = pd.read_sql("SELECT plant_id_eia, boiler_id, report_date FROM boilers_eia860", pudl_engine)
bids["boiler_id_orig"] = bids.boiler_id.copy()
fixed_bids = pudl.helpers.remove_leading_zeros_from_numeric_strings(bids, col_name="boiler_id")
fixed_bids["n_ids"] = fixed_bids.groupby(["plant_id_eia", "boiler_id"]).boiler_id_orig.transform("nunique")
fixed_bids = fixed_bids[fixed_bids.n_ids>1].sort_values(["plant_id_eia", "boiler_id"])
fixed_bids = fixed_bids.groupby(["plant_id_eia", "boiler_id"]).boiler_id_orig.apply(set).to_frame()
fixed_bids

All Inconsistent Boiler IDs

| (plant_id_eia, boiler_id) | boiler_id_orig | |:---------------|:-----------------| | (202, '1') | {'1', '01'} | | (203, '1') | {'1', '01'} | | (861, '1') | {'1', '01'} | | (861, '2') | {'2', '02'} | | (863, '5') | {'05', '5'} | | (863, '6') | {'06', '6'} | | (864, '1') | {'1', '01'} | | (864, '2') | {'2', '02'} | | (864, '3') | {'03', '3'} | | (864, '4') | {'4', '04'} | | (864, '5') | {'05', '5'} | | (864, '6') | {'06', '6'} | | (2491, '1') | {'001', '1'} | | (6013, '1') | {'1', '01'} | | (6013, '2') | {'2', '02'} | | (10502, '2') | {'2', '02'} | | (10603, '31') | {'031', '31'} | | (50561, '1') | {'1', '0001'} | | (50561, '2') | {'2', '0002'} | | (50611, '31') | {'031', '31'} | | (50739, '100') | {'0100', '100'} | | (54526, '1') | {'1', '00001'} | | (54529, '1') | {'1', '0001'} | | (55210, '1') | {'1', '0001'} | | (55596, '1') | {'1', '0001'} | | (55596, '2') | {'2', '0002'} | | (55596, '3') | {'0003', '3'} | | (55596, '5') | {'5', '0005'} | | (56188, '1') | {'001', '1'} | | (56221, '31') | {'031', '31'} | | (56453, '31') | {'031', '31'} | | (57705, '1') | {'1', '000001'} | | (58574, '1') | {'1', '01'} | | (60903, '1') | {'1', '0001'} | | (60903, '2') | {'2', '0002'} |

Hopefully addressing this issue is as simple as treating Boiler IDs the same way we treat Generator IDs in the EIA spreadsheet extractions, adding a call to pudl.helpers.remove_leading_zeros_from_numeric_strings() to each of the extractor classes. Based on where we're doing this for the generator_id values, it looks like this fix may impact the following datasets:

eia860
eia860m
eia861
eia923
epacems
epacamd_eia

I think most of those will go pretty smoothly, but I'm concerned about how this might interact with the EPA CAMD to EIA crosswalk.

catalyst-cooperative / pudl

Fix EIA boiler IDs that have inconsistenly reported leading zeros #2366