US-EPA-CAMD / easey-ui

Project Management repo for EPA Clean Air Markets Division (CAMD) Business Suite of applications
MIT License
0 stars 0 forks source link

Emission Views SP Partition Speed Solution #6410

Open djw4erg opened 1 month ago

djw4erg commented 1 month ago

Background

ECMPS uses Emission View tables to store display ready versions of emissions data. The tables are needed because formatting the reported data for display takes too much time for a user expecting to see the data. Because the emissions data exists in both the Workspace and Official schemas two sets of the Emission View tables are needed. A set of stored procedures were created to populate the Workspace schema Emission View tables and those procedures were performant. A version of those SP were created to populate the Official schema Emission View tables, but those procedures where not performant.

I believe the initial understanding of the Official schema performance problem was primarily because of the much larger amount of data in the Official schema. Additionally and based on testing, the multiple joins on the single emission tables for different parameters appeared to exacerbate slowness. Because of the perceived join issue, the "pivot" method was developed, which used table functions to flatten the hourly rows for multiple parameters into single rows for each hour to require one join per table. Although the "pivot" method reduced the slowness issue, speed problems with some of the Official stored procedures still exist.

Additional research exposed some truths about querying the RPT_PERIOD_ID partitioned Official schema emission tables. Where and especially join conditions on the tables will result in searching each partition for a table if the RPT_PERIOD_ID is not included in the condition for the table. This was the original slowness problem and explains why the "pivot" method worked since it included the RPT_PERIOD_ID in the where clause it used.

Previous Tickets

This ticket is intended to replace previous tickets and will result in a QA of all Emission Views and will cover the testing for those tickets. As a result, both of the tickets below will be closed.

5793

6324

Requirement

Update the stored procedures that load the Emission View tables to use direct joins for each parameter to an emissions table. However each join to the tables must include the RPT_PERIOD_ID.

Additionally, after ensuring they are not used outside of Emission View population, drop the following table functions in both the CAMDECMPS and CAMDECMPSWKS schemas.

Stored Procedure List

Handled SP already include the adjustment to use the RPT_PERIOD_ID in any where or on clause that references a partitioned emissions table.

Stored Procedure Screen Name Official Workspace
REFRESH_EMISSION_VIEW_ALL Hourly Combined Parameters View :white_check_mark: :white_check_mark:
REFRESH_EMISSION_VIEW_CO2APPD Hourly CO2 Appendix D View :white_check_mark: :white_check_mark:
REFRESH_EMISSION_VIEW_CO2CALC CO2 Calculation :white_check_mark: :white_check_mark:
REFRESH_EMISSION_VIEW_CO2CEMS Hourly CO2 CEMS View :white_check_mark: :white_check_mark:
REFRESH_EMISSION_VIEW_CO2DAILYFUEL CO2 Daily Fuel Sampling View :red_square: :red_square:
REFRESH_EMISSION_VIEW_DAILYCAL Daily Calibration View :ballot_box_with_check: :white_check_mark:
REFRESH_EMISSION_VIEW_HIAPPD Hourly Heat Input Appendix D View :white_check_mark: :white_check_mark:
REFRESH_EMISSION_VIEW_HICEMS Hourly Heat Input CEMS View :white_check_mark: :white_check_mark:
REFRESH_EMISSION_VIEW_HIUNITSTACK Heat Input for Unit/Stack View :white_check_mark: :white_check_mark:
REFRESH_EMISSION_VIEW_LME LME View :white_check_mark: :white_check_mark:
REFRESH_EMISSION_VIEW_MASSOILCALC Mass Oil Calculation View :white_check_mark: :white_check_mark:
REFRESH_EMISSION_VIEW_MATSHCL MATS HCL View :ballot_box_with_check: :white_check_mark:
REFRESH_EMISSION_VIEW_MATSHF MATS HF View :white_check_mark: :white_check_mark:
REFRESH_EMISSION_VIEW_MATSHG MATS HG View :ballot_box_with_check: :white_check_mark:
REFRESH_EMISSION_VIEW_MATSSO2 MATS SO2 View :white_check_mark: :white_check_mark:
REFRESH_EMISSION_VIEW_MATSSORBENT MATS Sorbent View :red_square: :white_check_mark:
REFRESH_EMISSION_VIEW_MATSWEEKLY MATS weekly View :white_check_mark: :white_check_mark:
REFRESH_EMISSION_VIEW_MOISTURE Moisture View :white_check_mark: :white_check_mark:
REFRESH_EMISSION_VIEW_NOXAPPEMIXEDFUEL Unit Level Fuel Curve View :red_square: :white_check_mark:
REFRESH_EMISSION_VIEW_NOXAPPESINGLEFUEL NOX Appendix E Individual Fuel Curve View :white_check_mark: :white_check_mark:
REFRESH_EMISSION_VIEW_NOXMASSCEMS Hourly NOx Mass CEMS View :ballot_box_with_check: :white_check_mark:
REFRESH_EMISSION_VIEW_NOXRATECEMS Hourly NOx Rate CEMS View :white_check_mark: :white_check_mark:
REFRESH_EMISSION_VIEW_OTHERDAILY Other Daily Tests View :white_check_mark: :white_check_mark:
REFRESH_EMISSION_VIEW_SO2APPD Hourly SO2 Appendix D View :white_check_mark: :white_check_mark:
REFRESH_EMISSION_VIEW_SO2CEMS Hourly SO2 CEMS View :white_check_mark: :white_check_mark:

Additional Issues

  1. CO2DAILYFUEL and NSPS4T has data in CAMDECMPS after test run of "refresh_emissions_views" for the following Monitoring Plan and Reporting Period Ids, but data did not exist in CAMDECMPSWKS after the test run.
    • ( 'TWCORNEL5-C0E3879920A14159BAA98E03F1980A7A' , 123 ),
    • ( 'MDC-19939DF8A9FD4C12B3869C394A2C0B1D' , 123 ),
    • ( 'EGWITLT01-C88ABE75BA5D4C6CB5E2D1EFEC8C4F38' , 123 ),
    • ( 'MDC-0B5C7E00DEFD410A9B883B5E381A8C4B' , 113 ),
    • ( 'MDC-80A407D4AC8D478E93F95FB702EC0B3E' , 123 ),
    • ( 'BPU094-545C837365784BE781EDECEAA08626A3' , 123 ),
    • ( 'MDC-07AC944F78A043E08E21977F9DA727EA' , 123 ),
    • ( 'MDC-4F0130B46ADB4B4F9C2EE0BA4A72D463' , 123 ),
    • ( 'MDC-88B24CFAC6E94EFAA0517A9EC9FB4D9E' , 123 ),
    • ( 'MDC-98642474BB9D49CAB5CB11F2352A3598' , 123 ),
    • ( 'RMUENSTERL-FBE23B81FE4A4C6DADBDDAA1496DB309' , 123 ),
    • ( 'MDC-EC0E1E1664394A6DB9DF0DBCFD5C93C1' , 123 ),
    • ( 'MDC-3B23A43070D74F53AA2DAD83ABFC190C' , 123 )
djw4erg commented 1 month ago

Explain Plant Cost Results

The explain plan for the actual pivot version of the select used to populate the Emission View tables uses a default for the pivot function call that does not reflect the work the pivot functions actually do. So a version of the select statement had to be created to get a more accurate explain plan cost for the pivot version, but without the error codes. In some cases the pivot version had a better cost than the new version, and an alternate version based on that fact was used. If the "Alternate" column has "Pivot", the alternate is the Pivot version limited to needed columns and with Error Codes has been implemented and is noted in the Pivot Alt column. "Direct" indicates that the alternate version uses direct joins even for multiple joins to the same table.

Model's Stored Procedure Pivot Low Pivot High New Low New High Alternate
REFRESH_EMISSION_VIEW_ALL 10,822.33 138,045.30 8,327.44 32,209.00
REFRESH_EMISSION_VIEW_CO2APPD 5.91 4,839.80 3,578.77 3,596.50
REFRESH_EMISSION_VIEW_CO2CALC 15.36 102.62 498.99 566.02
REFRESH_EMISSION_VIEW_CO2CEMS 47,856.64 85,952.26 9,493.77 33,422.93
REFRESH_EMISSION_VIEW_CO2DAILYFUEL N/A N/A 155.81 301.61
REFRESH_EMISSION_VIEW_DAILYCAL 879.64 1,035.76 2,12 2,903.21
REFRESH_EMISSION_VIEW_HIAPPD 4.20 83.97 2,786.32 2,804.10
REFRESH_EMISSION_VIEW_HICEMS 9,344.09 20,633.60 5,527.98 5,546.06
REFRESH_EMISSION_VIEW_HIUNITSTACK 3.22 78.13 2.66 7,039.75
REFRESH_EMISSION_VIEW_LME 60.18 189.72 4,684.63 4,702.44
REFRESH_EMISSION_VIEW_MASSOILCALC 3.08 47.39 55.37 2,445.41
REFRESH_EMISSION_VIEW_MATSHCL 238.54 238.61 114.02 174.96
REFRESH_EMISSION_VIEW_MATSHF 132.65 132.66 44.23 62.21
REFRESH_EMISSION_VIEW_MATSHG 96,260.58 96,260.67 10,223.69 10,231.74
REFRESH_EMISSION_VIEW_MATSSO2 41,106.42 41,106.48 2,350.00 4,705.14
REFRESH_EMISSION_VIEW_MATSSORBENT 1.80 11,985.22 N/A N/A
REFRESH_EMISSION_VIEW_MATSWEEKLY 71.95 205.51 1.95 68.71
REFRESH_EMISSION_VIEW_MOISTURE 21,761.66 57,608.24 376.68 423.39
REFRESH_EMISSION_VIEW_NOXAPPEMIXEDFUEL 928.65 3,064.11 N/A N/A
REFRESH_EMISSION_VIEW_NOXAPPESINGLEFUEL 50.12 148.00 1,794.73 3,603.50
REFRESH_EMISSION_VIEW_NOXMASSCEMS1 80,771.58 80,771.70 5,501.82 5,519.61
REFRESH_EMISSION_VIEW_NOXRATECEMS 7.25 150.29 3,607.70 248.16
REFRESH_EMISSION_VIEW_OTHERDAILY 875.28 879.60 1.83 739.61
REFRESH_EMISSION_VIEW_SO2APPD 3,586.59 3,604.37
REFRESH_EMISSION_VIEW_SO2CEMS 29,123.43 103,967.68 5,505.47 5,523.27

Notes:

  1. Pivot values are for previous method that was not actually the pivot method. However, implemented new knowledge resulted in additional improvements.
  2. REFRESH_EMISSION_VIEW_SO2APPD: The explain plan for the "pivot" version never completed.
  3. REFRESH_EMISSION_VIEW_CO2DAILYFUEL: Does not use the "pivot" method or have a speed issue.
  4. REFRESH_EMISSION_VIEW_DAILYCAL: Did not use the "pivot" method, but did use a temp-table.
  5. REFRESH_EMISSION_VIEW_LME: Although score was better for Pivot Method, The actually implementation took an excessive amount of time for a number of emission reports.
  6. REFRESH_EMISSION_VIEW_MATSHG: Pivot columns are for pre-version, which did not use the pivot method.
  7. REFRESH_EMISSION_VIEW_MATSHF: Based on HCL version and done without data.
  8. REFRESH_EMISSION_VIEW_MATSSORBENT: Does not use the "pivot" method or have a speed issue.
  9. REFRESH_EMISSION_VIEW_NOXRATECEMS: Although score was better for Pivot Method, The actually implementation took an excessive amount of time for a number of emission reports.
djw4erg commented 1 month ago

Test Emission Reports

For the REFRESH_EMISSION_VIEW SP Versions.

SP Version ORIS Facility Name Locations Quarter
ALL 2712 Roxboro CS004A, 4A, 4B 2023 Q3
CO2APPD 3 Barry 6A 2023 Q3
CO2CALC 976 Marion 123 2023 Q3
CO2CEMS 2712 Roxboro CS004A, 4A, 4B 2023 Q3
CO2DAILYFUEL 260 Moss Landing Power Plant 1A 2023 Q3
DAYCAL 2712 Roxboro CS004A, 4A, 4B 2023 Q3
HIAPPD 3 Barry 6A 2023 Q3
HICEMS 2712 Roxboro CS004A, 4A, 4B 2023 Q3
HIUNITSTACK 7760 Tilton Power Station CP01, 1, 2, 3, 4 2023 Q3
LME 1305 McPherson 2 CP001, GT1, GT3 2023 Q3
MASSOILCALC 1599 Canal Station 1 2023 Q3
MATSHCL 60 Gerald Whelan Energy Center 1 2023 Q3
MATSHF 60 Gerald Whelan Energy Center 1 2023 Q3
MATSHG 2712 Roxboro CS004A, 4A, 4B 2023 Q3
MATSSO2 60 Gerald Whelan Energy Center 2 2023 Q3
MATSSORBENT 130 Cross 1 2023 Q3
MATSWEEKLY 60 Gerald Whelan Energy Center 2 2023 Q3
MOISTURE 3236 Manchester Street Station 11 2023 Q3
NOXAPPEMIXEDFUEL 7760 Tilton Power Station CP01, 1, 2, 3, 4 2023 Q3
NOXMASSCEMS 880041 Ascend (Decatur Plant) X015 2023 Q3
NOXMASSRATE 2712 Roxboro CS004A, 4A, 4B 2023 Q3
SO2APPD 3 Barry 6A 2023 Q3
SO2CEMS 2712 Roxboro CS004A, 4A, 4B 2023 Q3

Monitoring Plan Ids

ORIS Facility Name Locations MP Id
3 Barry 6A MDC-68FF9CD5F0C2464E85FD2A3C15D5A670
60 Gerald Whelan Energy Center 1 MDC-19939DF8A9FD4C12B3869C394A2C0B1D
60 Gerald Whelan Energy Center 2 EGWITLT01-C88ABE75BA5D4C6CB5E2D1EFEC8C4F38
130 Cross 1 MDC-3B23A43070D74F53AA2DAD83ABFC190C
260 Moss Landing Power Plant 1A MDC-F8343B35AEAB4AE8986E74C085B741BB
976 Marion 123 MDC-80A407D4AC8D478E93F95FB702EC0B3E
1305 McPherson 2 CP001, GT1, GT3 BPU094-545C837365784BE781EDECEAA08626A3
1599 Canal Station 1 MDC-07AC944F78A043E08E21977F9DA727EA
2712 Roxboro CS004A, 4A, 4B MDC-4F0130B46ADB4B4F9C2EE0BA4A72D463
3236 Manchester Street Station 11 MDC-88B24CFAC6E94EFAA0517A9EC9FB4D9E
7697 Washington County Cogen (Olin) CC1 MDC-6F6D3CDF11584A639727CF10C34E3BB4
7760 Tilton Power Station CP01, 1, 2, 3, 4 RMUENSTERL-FBE23B81FE4A4C6DADBDDAA1496DB309
880041 Ascend (Decatur Plant) X015 MDC-EC0E1E1664394A6DB9DF0DBCFD5C93C1

Reporting Period Ids

Quarter Id
2023 Q3 123
djw4erg commented 1 month ago

Testing Considerations

  1. MODC 47 and MODC 48 can cause duplicate rows for O2D, O2W and possibly CO2C and NOXC.
    • Determine which Emission Views explicitly handle MODC 47 and MODC 48 and which others may be affected.

Discussed with Scott and MODC 47 and 48 only matter for the NOXR CEMS view.

esaber76 commented 2 weeks ago

New issues affecting both public and workspace views:

Hourly NOx Rate CEMS View:

  1. Rptd. % Diluent and Diluent MODC columns are null across an entire quarter (even for measured data hours) if MODC 46 is reported in the emissions data for the diluent component (doesn't affect NOXC MODC 46 and a diluent component reporting standard missing data MODC codes). Examples: ORIS 260, 1A (2023 Q3; only includes MODC 46 examples) and ORIS 7153, 5 (2023 Q3; includes MODC 46, 47, and 48 examples).

MATS HG View:

  1. The "U" and "D" codes appear in the MATS Startup Shutdown column instead of "Startup" and "Shutdown". This is not an issue in the MATS SO2 and MATS HCL views (didn't check MATS HF as data does not exist). Example: ORIS 628, 4 (2023 Q3).
  2. Sampling Train Component Id 1, Gas Flow Meter Reading 1, Ratio Stack Gas Flow Rate 1, Sampling Train Component Id 2, Gas Flow Meter Reading 2, Ratio Stack Gas Flow Rate 2 columns are null when these values are based on the prior quarter’s reported sorbent trap (quarter boundary trap). Example: ORIS 1356, CS023, 2, 3 (2023 Q3).
djw4erg commented 1 week ago

Responses to QA Results Comment

Hourly NOx Rate CEMS View:

  1. This was caused because the use of both Wet and Dry O2 data should include O2 without a Moisture Basis as either Wet or Dry when needed. Before this correction, the Moisture Basis had to be explicit.

MATS HG View:

  1. Updated to return "Startup" for "U" and "Shutdown" for "D".
  2. Included SORBENT_TRAP_SUPP_DATA and SAMPLING_TRAIN_SUPP_DATA in logic to determine Components 1 and 2 for date ranges.

Notes

  1. Both Official and Workspace updated.
esaber76 commented 3 days ago

Acceptance criteria in attached document.

Testing Steps.docx