CEMPD / SMOKE

Create emissions inputs for multiple air quality modeling systems with unmatched speed and flexibility
https://www.cmascenter.org/smoke/
45 stars 21 forks source link

problems with Smkinven daily area source processing #91

Closed callen52 closed 1 month ago

callen52 commented 5 months ago

I've found a couple of issues with daily area source processing in Smkinven.

1) When SMK_PROCESS_HAPS = ALL, we get this error in Smkinven (caused by all VOC being converted to NONHAPVOC): INTERNAL ERROR: Could not find variable "VOC" in map file We are working around this by setting SMK_PROCESS_HAPS = PARTIAL and only including out-of-domain sources in the NHAPEXCLUDE so as to not affect the modeling.

2) For March, everything runs fine until it tries to process April 1 (timestep 2021091 in this case), at which point Smkinven returns a bunch of bogus "Both VOC|TOG and toxics are not found" errors for sources which in fact have both VOC and toxics. This only affects timesteps from 2021091:040000 onwards, so this doesn't actually affect March modeling. Smkinven for April works just fine.

I'm attaching some sample input data and Smkinven logs for the two issues. for_daily_processing_smoke_bug_report_30jan2024.zip

hnqtran commented 5 months ago

Hi Christine. I'm wanting to replicate the issue that you are having. Could you please share with me your run script along with its input files?

callen52 commented 5 months ago

Hi Huy,

Here is the run script (Monthlyarea...), COSTCY, INVTABLE, and an updated version of the smk_ar_monthly_emf.csh helper script to support this type of processing. I think this plus the the contents of what I originally posted should give you all you need to run Smkinven, but let me know if you're still missing anything.

costcy_for_2017platform_20dec2023_v8.txt invtable_2017_NATA_CMAQ_26apr2023_v7.txt Monthly_area_Q1_livestock_12US1_2021hb_cb6_21k_20240130092302.csh.txt smk_ar_monthly_emf.csh.txt

hnqtran commented 4 months ago

Hi Christine,

I was able to replicate the first issue that you reported, and I found that the solution to issue #82 would solve this. Would you please make the modifications to SETNONHAP subrountine, recompile smkinven and let me know if it work?

I'm trying to get to the second issue and let will you know if I find anything. Thanks.

Huy

hnqtran commented 4 months ago

Progress update:

livestock.HAPS_patial.orig_vs_fixed.m3diff.txt livestock.fixed.HAPS_patial_vs_all.m3diff.txt

hnqtran commented 4 months ago

Progress update: There are two daily emission input files from Christine's test packages: livestock_2021hb_FEM_daily_test.csv and livestock_2021hb_nonFEM_daily_test.csv. In these two files, emissions entries for some FIPS and SCCs are unsorted, e.g. their entires for January through May are in one part of the files, and entries for June - December are in another part of the files.

As the livestock_2021hb_FEM_daily_test.csv was sorted (attached), i.e. emission entries for each combination of FIPS, SCC and pollutant are put in 12 consecutive lines for January - December, smkinven ran successfully.

However, smkinven still failed with error "Found VOC|TOG but no toxics found" when running with the livestock_2021hb_nonFEM_daily_test.csv even after it was sorted.

Some sort functions called in smkinven are the suspect for this bug.

livestock_2021hb_FEM_daily_test_sorted.csv

hnqtran commented 3 months ago

Progress update: (edited for more detail)

The second issue in this ticket related to bogus "Both VOC|TOG and toxics are not found" errors was found related to treatment of daylight saving time in the subroutine $SMOKE_HOME/src/smkinven/rdff10pd.f which reads data from daily specific emission files.

In brief, the sequential processes in rdff10pd.f are as follow:

Issue happens within this block of code between line 899 - 923

                H = 0
                DO T = PTR, MIN( PTR + 23, NSTEPS )

                    H = H + 1
                    NPDPT( T ) = NPDPT( T ) + 1

                    HS = NPDPT( T )

                    IF( HS .LE. MXPDSRC ) THEN

                        IDXSRC( HS,T ) = HS
                        SPDIDA( HS,T ) = S
                        CIDXA ( HS,T ) = CIDX
                        CODEA ( HS,T ) = COD
                        IF( CEMPOL ) THEN
                            EMISVA( HS,T ) = CONVFAC * EMIS(S,COD) * TDAT( D,H )  ! Store data in emissions
                            DYTOTA( HS,T ) = CONVFAC * EMIS(S,COD) * TOTAL
                        ELSE
                            EMISVA( HS,T ) = CONVFAC * TDAT( D,H )  ! Store data in emissions
                            DYTOTA( HS,T ) = CONVFAC * TOTAL
                        END IF

                    END IF

                END DO

Assuming that we have a simplified daily emission input file for a single FIPS 01005 with single SCC 2805035000 with four pollutant NH3, BENZENE, TOLUENE, VOC. In this case, S = 1, max number of sources MXPDSRC = 4 (4 unique combinations of FIPS/SCC/COD).

If we write out values of all variables in the above loop for debug, it would look like the following when it read to the last day of March (Time step = T =768):

FIPS= 01005; SCC= 2805035000; NH3; T= 768; NPDPT(T)= 1; HS= 1; S = 1; CIDX= 695; COD= 1; EMISVA= 0.851D-01

When rdff10pd.f read next data for April, and because March was marked for daylight saving time, rdff10pd.f rewrite data of last hour in March with first hour in April (in local time, which is 05:00 UTC for this FIPS), and the debug line for this hour will look as the following:

FIPS= 01005; SCC= 2805035000; NH3; T= 768; NPDPT(T)= 2; HS = 2; S = 1; CIDX= 695; COD= 1; EMISVA= -0.100D+38

Note that while rewriting data for T = 768, rdff10pd.f also mistakenly increment value of NPDPT(T) and HS. This means for every pollutant, HS value at this time step increased by 2 instead of 1. As rdff10pd.f finished reading data for BENZENE at this time step, HS = 4. When it got to reading data for VOC at this time step, HS = 7 then 8. However, rdff10pd.f only recorded data when HS <= MXPDSRC and so any data associated with HS > 4 will be ignored. Therefore VOC and TOLUENE were not recorded for this time step T = 768. Consequently, smkinven aborted in the subsequent module with error "Found toxics but no VOC|TOG found for the source" (it found BENZENE but could not found VOC; also could not found TOLUENE but this does not cause the crash).

Depending on the order of pollutant appearance in the daily input file, smkinven could crash or not with different message:

Note that the above issue is only applicable to last time step in March and to counties that observe daylight saving time (DST). There is no issue with data in other time step and in counties that do not observe DST.

hnqtran commented 3 months ago

Update (Continue)

Issue with daylight saving time also extends to November where in county observing DST, data for the first hour of December was entirely skipped. The skipped hour, however, does not cause smkinven to crash.

This issue may have been going on for long time and unless it cause smkinven to crash as reported by Christine in this ticket, it was hard to be detected. It only affects the last hour in March or first hour in December, and when we do QA/QC with SMOKE report file which often at annual scale, the difference between input and output may only be subtle.

Fix for this issue:

IF (.NOT. GETSIZES ) THEN
            IF( ALLOCATED( HSVAL ) ) DEALLOCATE( HSVAL )
            ALLOCATE( HSVAL(NSRC,NIPPA,NSTEPS), STAT=IOS )
            CALL CHECKMEM( IOS, 'HSVAL', PROGNAME )
            HSVAL  = 0
END IF
                H = 0
                DO T = PTR, MIN( PTR + 23, NSTEPS )

                    H = H + 1

c                 Processing for Day light saving start                    
                   IF ( HSVAL(S,COD,T) .NE. 0 ) THEN ! If this combination of source/ pol/timestep was already processed
                       HS = HSVAL(S,COD,T)           ! Recover recorded HS value

c                     Just update emission values and move on
                       IF( HS .LE. MXPDSRC ) THEN

                           IF( CEMPOL ) THEN               
                                EMISVA( HS,T ) = CONVFAC * EMIS(S,COD) * TDAT( D,H )  ! Store data in emissions
                                DYTOTA( HS,T ) = CONVFAC * EMIS(S,COD) * TOTAL
                          ELSE
                                EMISVA( HS,T ) = CONVFAC * TDAT( D,H )  ! Store data in emissions
                                DYTOTA( HS,T ) = CONVFAC * TOTAL
                          END IF 
                      END IF

                      CYCLE
                   END IF

c                 Processing for when Day light saving end;
                   IF ( T .GT. 2 ) THEN
                      IF ( HSVAL(S,COD,T-1) .EQ. 0 ) THEN ! No data had been read for the previous hour of this combo
                         NPDPT(T-1) = NPDPT(T-1) + 1
                         HS = NPDPT(T-1)

                         IF( HS .LE. MXPDSRC ) THEN

                                IDXSRC( HS,T-1 ) = HS
                                SPDIDA( HS,T-1 ) = S
                                CIDXA ( HS,T-1 ) = CIDX
                                CODEA ( HS,T-1 ) = COD
                                IF( CEMPOL ) THEN
                                    EMISVA( HS,T-1 ) = CONVFAC * EMIS(S,COD) * TDAT( D,H )  ! Store data in emissions
                                    DYTOTA( HS,T-1 ) = CONVFAC * EMIS(S,COD) * TOTAL
                                ELSE
                                    EMISVA( HS,T-1 ) = CONVFAC * TDAT( D,H )  ! Store data in emissions
                                    DYTOTA( HS,T-1 ) = CONVFAC * TOTAL
                                END IF
                        END IF   

                        HSVAL(S,COD,T-1) = HS  
                      END IF
                    END IF 

                    NPDPT( T ) = NPDPT( T ) + 1
                    HS = NPDPT( T )

                    IF( HS .LE. MXPDSRC ) THEN

                        IDXSRC( HS,T ) = HS
                        SPDIDA( HS,T ) = S
                        CIDXA ( HS,T ) = CIDX
                        CODEA ( HS,T ) = COD
                        IF( CEMPOL ) THEN
                            EMISVA( HS,T ) = CONVFAC * EMIS(S,COD) * TDAT( D,H )  ! Store data in emissions
                            DYTOTA( HS,T ) = CONVFAC * EMIS(S,COD) * TOTAL
                        ELSE
                            EMISVA( HS,T ) = CONVFAC * TDAT( D,H )  ! Store data in emissions
                            DYTOTA( HS,T ) = CONVFAC * TOTAL
                        END IF

                    END IF

                    HSVAL(S,COD,T) = HS    ! Record HS source index for this combination of source, pol, and timestep

                END DO  
callen52 commented 1 month ago

I've confirmed this has been fixed.