US-EPA-CAMD / easey-ui

Project Management repo for EPA Clean Air Markets Division (CAMD) Business Suite of applications
MIT License
0 stars 0 forks source link

Refactor Refresh Emissions Views Database Procedures #5793

Open jwhitehead77 opened 8 months ago

jwhitehead77 commented 8 months ago

The emissions view refresh procedures ported from the MS SQL client and used in the workspace do not perform well when used in the official schema due to the vast amount of data. Need to refactor the refresh procedures to be more performant overall.

There are 3 functions that pivot the derived_hrly_value, monitor_hrly_value, & hrly_param_fuel_flow tables so that the values needed for each parameter are a single row for each hour reported.

NOTE: The above functions are dynamic functions that return a dynamic list of columns based on the parameter codes passed into the function from the caller. If you add a column to be returned then anything that uses it must be updated to expected that column.

Instead of joining multiple times on derived_hrly_value, monitor_hrly_value, and/or hrly_param_fuel_flow which has 100+ million records based on different parameters. Replace all these joins with one join to the appropriate pivot functions above passing in the parameters that data is needed for.

The following have been refactored to use the pivot approach and can be used as examples...

The following had previously been refactored to use an approach that does not always work well for every case and needs to be refactored to use the pivot approach...

The following are the original ported procedures that have not been refactored at all and needs to be refactored to use the pivot approach...

The following cannot use the pivot approach but need to be looked at from a performance perspective...

NOTE: Refactor the camdecmps schema (official) procedures and test in the TEST or CDC database which has all of the production data and then copy them to the camdecmpswks schema (workspace) replacing the referenced schema from camdecmps to camdecmpswks

The following procedures are not persisted views and instead uses an actual view for the data so these refresh procedures are just refreshing the counts and should not need refactoring...

JanellC commented 8 months ago

CVP will not work on this ticket. DPC will complete reamaining work

djw4erg commented 8 months ago

Jason, is it possible to load the Emission View data from Workspace into Official when emission reports are submitted instead of reproducing the values? This approach would need to account for the following:

The second issue would eventually require improving the efficiency of the Official process, but the need might not be as urgent.

alangmaid commented 8 months ago

ECMPS Priorities Follow-up Emission View Generation FYI.txt

ergjustin commented 7 months ago
ergjustin commented 7 months ago

PR: https://github.com/US-EPA-CAMD/easey-db-scripts/pull/10

shahbazkhan1999 commented 5 months ago

need to discuss with @ergjustin

djw4erg commented 3 months ago

Related Tickets:

renaemyers commented 3 months ago

Address monitor_hrly_value slowness: https://github.com/US-EPA-CAMD/easey-ui/issues/6174

djw4erg commented 2 months ago

Example Potential Alternate Solution (View Emissions SO2 CEMS)

As far as I can tell the pivot function method would not use indexes to join the DHV and MHV data. This might not cause an execution hit, but worries me. The following method achieves a similar result as the pivot function method, but should join using indexes.

View Emissions SO2 CEMS Query

select  cor.Oris_Code,
        cor.Facility_Name,
        cor.Locations,
        /* Real Columns */
        cor.Mon_Plan_Id,
        cor.Mon_Loc_Id,
        cor.Rpt_Period_Id,
        cor.Begin_Date as Date,
        cor.Begin_Hour as Hour,
        cor.Op_Time,
        cor.Hr_Load as Unit_Load,
        cor.Load_Uom_Cd as Load_Uom,     
        cor.So2c_Mhv_Unadjusted_Hrly_Value as Unadj_So2,
        cor.So2c_Mhv_Applicable_Bias_Adj_Factor as So2_Baf, 
        cor.So2c_Mhv_Adjusted_Hrly_Value as Rpt_Adj_So2,
        cor.So2c_Mhv_Calc_Adjusted_Hrly_Value as Calc_Adj_So2, 
        cor.So2c_Mhv_Modc_Cd as So2_Modc,
        cor.So2c_Mhv_Pct_Available as So2_Pma,   
        cor.Flow_Mhv_Unadjusted_Hrly_Value as Unadj_Flow,
        cor.Flow_Mhv_Applicable_Bias_Adj_Factor as Flow_Baf,
        cor.Flow_Mhv_Adjusted_Hrly_Value as Rpt_Adj_Flow,
        cor.Flow_Mhv_Calc_Adjusted_Hrly_Value as Adj_Flow_Used,
        cor.Flow_Mhv_Modc_Cd as Flow_Modc,
        cor.Flow_Mhv_Pct_Available as Flow_Pma,
        cor.So2_Dhv_Calc_Pct_Moisture as Pct_Wrong_Used,
        case 
            when cor.So2_Dhv_Calc_Pct_Moisture is null then null
            when cor.h2o_dhv_Modc_Cd is not null then cor.h2o_dhv_Modc_Cd
            when cor.h2o_mhv_Modc_Cd is not null then cor.h2o_mhv_Modc_Cd
            else 'DF'
        end as Source_H2o_Value,
        frm.Equation_Cd as So2_Formula_Code,
        cor.So2_Dhv_Adjusted_Hrly_Value as Rpt_So2_Mass_Rate,
        cor.So2_Dhv_Calc_Adjusted_Hrly_Value as Calc_So2_Mass_Rate,
        cor.Hi_Dhv_Calc_Adjusted_Hrly_Value,
        cor.H2o_Dhv_Modc_Cd,
        cor.So2r_Dhv_Adjusted_Hrly_Value,
        cor.H2o_Mhv_Modc_Cd,
        case (frm.Equation_Cd)
            when 'F-23' then cor.Hi_Dhv_Calc_Adjusted_Hrly_Value
        end as Calc_Hi_Rate,
        case (frm.Equation_Cd)
            when 'F-23' then 
                coalesce
                (
                    cor.So2r_Dhv_Adjusted_Hrly_Value,
                    (
                        select  max( Default_Value )
                          from  camdecmps.MONITOR_DEFAULT def
                         where  def.Mon_Loc_Id = cor.Mon_Loc_Id
                           and  def.Parameter_Cd = 'SO2R'
                           and  def.Default_Purpose_Cd = 'F23'
                           and  ( def.Begin_Date  + interval '1' HOUR * def.Begin_Hour ) <= ( cor.Begin_Date  + interval '1' HOUR * cor.Begin_Hour )
                           and  ( ( def.End_Date is null) or ( def.End_Date  + interval '1' HOUR * def.End_Hour ) <= ( cor.Begin_Date  + interval '1' HOUR * cor.Begin_Hour ) )
                    )
                )
        end as Default_So2_Emission_Rate
  from  (
            select  sel.Oris_Code,
                    sel.Facility_Name,
                    sel.Locations,
                    sel.Mon_Plan_Id,
                    sel.Rpt_Period_Id,
                    sel.Mon_Loc_Id,
                    sel.Begin_Date,
                    sel.Begin_Hour,
                    sel.Op_Time,
                    sel.Hr_Load,
                    sel.Load_Uom_Cd,
                    max( case when dhv.Parameter_Cd = 'SO2'  then dhv.Hour_Id end ) as So2_Dhv_Hour_Id,
                    max( case when dhv.Parameter_Cd = 'SO2'  then dhv.Calc_Pct_Moisture end ) as So2_Dhv_Calc_Pct_Moisture,
                    max( case when dhv.Parameter_Cd = 'SO2'  then dhv.Adjusted_Hrly_Value end ) as So2_Dhv_Adjusted_Hrly_Value,
                    max( case when dhv.Parameter_Cd = 'SO2'  then dhv.Calc_Adjusted_Hrly_Value end ) as So2_Dhv_Calc_Adjusted_Hrly_Value,
                    max( case when dhv.Parameter_Cd = 'SO2'  then dhv.Mon_Form_Id end ) as So2_Dhv_Mon_Form_Id,
                    max( case when dhv.Parameter_Cd = 'HI'   then dhv.Calc_Adjusted_Hrly_Value end ) as Hi_Dhv_Calc_Adjusted_Hrly_Value,
                    max( case when dhv.Parameter_Cd = 'H2O'  then dhv.Modc_Cd end ) as H2o_Dhv_Modc_Cd,
                    max( case when dhv.Parameter_Cd = 'H2O'  then dhv.Pct_Available end ) as H2o_Dhv_Pct_Available,
                    max( case when dhv.Parameter_Cd = 'SO2R' then dhv.Adjusted_Hrly_Value end ) as So2r_Dhv_Adjusted_Hrly_Value,   
                    max( case when mhv.Parameter_Cd = 'FLOW' then mhv.Hour_Id end ) as Flow_Mhv_Hour_Id,     
                    max( case when mhv.Parameter_Cd = 'FLOW' then mhv.Unadjusted_Hrly_Value end ) as Flow_Mhv_Unadjusted_Hrly_Value,
                    max( case when mhv.Parameter_Cd = 'FLOW' then mhv.Applicable_Bias_Adj_Factor end ) as Flow_Mhv_Applicable_Bias_Adj_Factor,
                    max( case when mhv.Parameter_Cd = 'FLOW' then mhv.Adjusted_Hrly_Value end ) as Flow_Mhv_Adjusted_Hrly_Value,
                    max( case when mhv.Parameter_Cd = 'FLOW' then mhv.Calc_Adjusted_Hrly_Value end ) as Flow_Mhv_Calc_Adjusted_Hrly_Value,
                    max( case when mhv.Parameter_Cd = 'FLOW' then mhv.Modc_Cd end ) as Flow_Mhv_Modc_Cd,
                    max( case when mhv.Parameter_Cd = 'FLOW' then mhv.Pct_Available end ) as Flow_Mhv_Pct_Available,
                    max( case when mhv.Parameter_Cd = 'SO2C' then mhv.Unadjusted_Hrly_Value end ) as So2c_Mhv_Unadjusted_Hrly_Value,
                    max( case when mhv.Parameter_Cd = 'SO2C' then mhv.Applicable_Bias_Adj_Factor end ) as So2c_Mhv_Applicable_Bias_Adj_Factor,
                    max( case when mhv.Parameter_Cd = 'SO2C' then mhv.Adjusted_Hrly_Value end ) as So2c_Mhv_Adjusted_Hrly_Value,
                    max( case when mhv.Parameter_Cd = 'SO2C' then mhv.Calc_Adjusted_Hrly_Value end ) as So2c_Mhv_Calc_Adjusted_Hrly_Value,
                    max( case when mhv.Parameter_Cd = 'SO2C' then mhv.Modc_Cd end ) as So2c_Mhv_Modc_Cd,
                    max( case when mhv.Parameter_Cd = 'SO2C' then mhv.Pct_Available end ) as So2c_Mhv_Pct_Available, 
                    max( case when mhv.Parameter_Cd = 'H2O'  then mhv.Modc_Cd end ) as H2o_Mhv_Modc_Cd
              from  (
                        select  fac.Oris_Code,
                                fac.Facility_Name,
                                (
                                    select  string_agg( coalesce( unt.Unitid, stp.Stack_Name ), ', ' order by case when unt.Unit_Id is null then 0 else 1 end, coalesce( unt.Unitid, stp.Stack_Name ) )
                                      from  camdecmps.MONITOR_PLAN_LOCATION mpl
                                            join camdecmps.MONITOR_LOCATION loc on loc.Mon_Loc_Id = mpl.Mon_Loc_Id
                                            left join camd.UNIT unt on unt.Unit_Id = loc.Unit_Id
                                            left join camdecmps.STACK_PIPE stp on stp.Stack_Pipe_Id = loc.Stack_Pipe_Id
                                     where  mpl.Mon_Plan_Id = pln.Mon_Plan_Id
                                ) as Locations,
                                hod.Begin_Date,
                                hod.Begin_Hour,
                                hod.Op_Time,
                                hod.Hr_Load,
                                hod.Load_Uom_Cd,
                                ems.Mon_Plan_Id,
                                ems.Rpt_Period_Id,
                                mpl.Mon_Loc_Id,
                                hod.Hour_Id
                          from  camdecmps.EMISSION_EVALUATION ems
                                join camdecmps.MONITOR_PLAN pln on pln.Mon_Plan_Id = ems.Mon_Plan_Id
                                join camd.PLANT fac on fac.Fac_Id = pln.Fac_Id
                                join camdecmps.MONITOR_PLAN_LOCATION mpl on mpl.Mon_Plan_Id = ems.Mon_Plan_Id
                                join camdecmps.HRLY_OP_DATA hod on hod.Mon_Loc_Id = mpl.Mon_Loc_Id and hod.Rpt_Period_Id = ems.Rpt_Period_Id
                         where  ems.Mon_Plan_Id = 'TWCORNEL5-488E42008B434177BC7D7BFF138D18EF'
                           and  ems.Rpt_Period_Id = 109
                           and  pln.End_Rpt_Period_Id is null
                    ) sel
                    join camdecmps.DERIVED_HRLY_VALUE dhv on dhv.Hour_Id = sel.Hour_Id and dhv.Parameter_Cd in ( 'SO2', 'HI', 'H2O', 'SO2R' )
                    join camdecmps.MONITOR_HRLY_VALUE mhv on mhv.Hour_Id = sel.Hour_Id and mhv.Parameter_Cd in ( 'FLOW', 'SO2C', 'H20' )
             group 
                by  sel.Oris_Code,
                    sel.Facility_Name,
                    sel.Locations,
                    sel.Mon_Plan_Id,
                    sel.Rpt_Period_Id,
                    sel.Mon_Loc_Id,
                    sel.Begin_Date,
                    sel.Begin_Hour,
                    sel.Op_Time,
                    sel.Hr_Load,
                    sel.Load_Uom_Cd
        ) cor
        left join camdecmps.MONITOR_FORMULA frm on frm.Mon_Form_Id = cor.So2_Dhv_Mon_Form_Id
 where  cor.So2_Dhv_Hour_Id is not null 
   and  cor.Flow_Mhv_Hour_Id is not null
renaemyers commented 1 month ago

All involved emissions tables have been analyzed on dev and the refresh procedures are running much faster. Code has been checked for all updated procedures.

esaber76 commented 1 month ago

@djw4erg to provide scripts which can be used to first clear data from affected views prior to testing.

esaber76 commented 3 weeks ago

Testing Results

  1. For stack/pipe locations, the Hourly Heat Input CEMS and Hourly CO2 CEMS Views have counts which appear to be the sum for all locations in the monitoring plan even if HI and CO2 is not measured at all locations. Ex. ORIS 1356, CS023, 2, 3. The counts equal 6624 and should be 2208. HI and CO2 is only measured at CS023. These views are also showing data for units 2 and 3 and should not. This also affects other locations that do not have HI or CO2 CEMS (ex. LME, Appendix D with only NOXR CEMS have these views available). Affects both the public and workspace views.
  2. MATS HG View
    • HG Formula Code column is always null (both public and workspace views; ex. ORIS 1356, CS023).
    • Rptd. % Diluent, Diluent Parameter, Diluent MODC, and F-Factor columns null (both public and workspace views; ex. ORIS 564, 1).
  3. Hourly NOx Mass CEMS View does not display any data (both public and workspace views; ex. ORIS 880041, X015).
  4. %Diluent Used in Calc. and Calc. HCl Rate columns missing data in MATS HCL View in the workspace (Calc. HCl Rate column in particular should be populated before and after an evaluation; ex. ORIS 4125, 9).
  5. I've been seeing random instances where a lot of the other views are missing column data as well which may explain some of the comments above. I looked at this a bit closer and I see "CALL camdecmpswks.refresh_emissions_views..." getting stuck. In this example, it's been running for 15+ minutes. I saw others in there stuck for days but killed the processes. Does this procedure get called after an evaluation? That's when I've noticed it getting stuck and a lot of calculated columns missing data.

image

djw4erg commented 2 days ago

Handling Testing Results

  1. Hourly Heat Input CEMS and Hourly CO2 CEMS Views
  2. MATS HG View
    1. REFRESH_EMISSION_VIEW_MATSHG (CAMDECMPS) linked the MONITOR_FORMULA record using the H2O DHV MON_FORM_ID instead of the PMRE/PMRH MATS DHV MON_FORM_ID.
      • Columns directly affected by incorrect join
      • HG_FORMULA_CD
      • Columns conditionally affected by incorrect join because of MONITOR_FORMULA.EQUATION_CD value.
      • DILUENT_MODC
      • DILUENT_PARAMETER
      • F_FACTOR
      • RPT_PCT_DILUENT
    2. Corrected the MONITOR_FORMULA join in REFRESH_EMISSION_VIEW_MATSHG (CAMDECMPS).
  3. Hourly NOx Mass CEMS View
    1. Read Handling Testing Results Suggested Hourly NOxM CEMS Improved Query
    2. Note that the original pivot method to improve performance may not have been needed. It appears that the joins on HOUR_ID that used the NOx DHV row, should have used the HOUR_ID from the TEMP_HOURLY_TEST_ERRORS table.
      1. Note that replacing HOD subquery with the LOAD_TEMP_HOURLY_TEST_ERRORS temp table lead to an run time increase to over a minute. Assume that we need to perform an Analyze or Vacuum and Analyze after populating the temp table.
    3. Additionally, the RPT_NOX_MASS and CALC_NOX_MASS are incorrectly pulled from the NOxR DHV row instead of the NOx DHV row.
  4. MATS HCL View
  5. REFRESH_EMISSIONS_VIEWS Execution Stuck and Missing Column Data
djw4erg commented 16 hours ago

Handling Testing Results Suggested Hourly NOxM CEMS Improved Query

Correcting the Original Problem

Suggested Replacement Query for the Hourly NOxM CEMS View Refresh

select  distinct
        hod.mon_plan_id,
        hod.mon_loc_id,
        hod.rpt_period_id,
        camdecmps.format_date_hour( hod.begin_date, hod.begin_hour, null ) AS date_hour,
        hod.op_time,
        hod.hr_load as unit_load,
        hod.load_uom_cd as load_uom,
        noxc.Unadjusted_Hrly_Value as unadjj_nox,
        noxc.Applicable_bias_adj_factor as calc_nox_baf,
        noxc.adjusted_hrly_value as rpt_adj_nox,
        noxc.calc_adjusted_hrly_value as rpt_adj_nox,
        noxc.modc_cd as nox_modc,
        noxc.pct_available as nox_p0ma,
        flow.unadjusted_hrly_value as unadj_flow,
        flow.applicable_bias_adj_factor as calc_flow_baf,
        flow.adjusted_hrly_value as rpt_adj_flow,
        flow.calc_adjusted_hrly_value as adj_flow_used,
        flow.modc_cd as flow_modc,
        flow.pct_available as flow_pma,
        nox.calc_pct_moisture as pct_h2o_used,
        case 
            when ( noxr.calc_pct_moisture is not null ) then 
                case 
                    when ( h2o_m.modc_cd is not null ) then h2o_m.modc_cd 
                    when ( h2o_d.modc_cd is not null ) then h2o_d.modc_cd 
                    else 'DF'
                end 
            else null
        end as source_h2o_value,
        frm.equation_cd as noxmass_formula_code,
        nox.adjusted_hrly_value as rpt_nox_mass,
        nox.calc_adjusted_hrly_value as calc_nox_Mass
  from  temp_hourly_test_errors as hod 
         join camdecmps.DERIVED_HRLY_VALUE nox
           on nox.hour_id = hod.hour_id
          and nox.parameter_cd = 'NOX'
         join camdecmps.MONITOR_HRLY_VALUE flow
           on flow.hour_id = hod.hour_id
          and flow.parameter_cd = 'FLOW'
         join camdecmps.MONITOR_FORMULA frm
           on frm.mon_form_id = nox.mon_form_id
          and frm.equation_cd like 'F-26%'
         left join camdecmps.MONITOR_HRLY_VALUE noxc
           on noxc.hour_id = hod.hour_id
          and noxc.parameter_cd = 'NOXC'
         left join camdecmps.MONITOR_HRLY_VALUE h2o_m
           on h2o_m.hour_id = hod.hour_id
          and h2o_m.parameter_cd = 'H2O'
         left join camdecmps.DERIVED_HRLY_VALUE h2o_d
           on h2o_d.hour_id = hod.hour_id
          and h2o_d.parameter_cd = 'H2O'
         left join camdecmps.DERIVED_HRLY_VALUE noxr
           on noxr.hour_id = hod.hour_id
          and noxr.parameter_cd = 'NOXR'