CEMPD / SMOKE

Create emissions inputs for multiple air quality modeling systems with unmatched speed and flexibility
https://www.cmascenter.org/smoke/
45 stars 21 forks source link

review potential updates to Tmpbeis4 code #84

Closed cseppan closed 5 months ago

cseppan commented 7 months ago

Based on October 29, 2023 email from Carlie Coats

Draft code (has not been compiled or tested). Switch to using M3UTILIO module in Tmpbeis4. Change grid-and-species loop nests. Check for failure after environment variable calls (e.g. ENVINT).

"tmpbeis4.0.f" is the un-changed reference version "tmpbeis4.1.f" is the minimal changes-for-M3UTILIO version "tmpbeis4.2.f" with loop-nest orders changed for efficiency "tmpbeis4.f" further revision sent

tmpbeis.zip

cseppan commented 7 months ago

Discussion of the I/O API M3UTILIO module and how to convert existing code https://cmascenter.org/ioapi/documentation/all_versions/html/M3UTILIO.html

hnqtran commented 7 months ago

SMOKE v5.0 compiled successfully with Carlie's modified tmpbeis4.f Have not checked how fast SMOKE run with the update

hnqtran commented 5 months ago

Summary of Carlie's updates to tmpbeis4.f:

  1. Reconcile ALLOCATE statements. For example, lines 793 - 805 in original tmpbeis4.f
        IF( PX_VERSION ) THEN
            ALLOCATE( SOILM( NCOLS, NROWS ), STAT=IOS )
            CALL CHECKMEM( IOS, 'SOILM', PROGNAME )

            ALLOCATE( SOILT( NCOLS, NROWS ), STAT=IOS )
            CALL CHECKMEM( IOS, 'SOILT', PROGNAME )

            ALLOCATE( SOILT2( NCOLS, NROWS ), STAT=IOS )
            CALL CHECKMEM( IOS, 'SOILT2', PROGNAME )

            ALLOCATE( ISLTYP( NCOLS, NROWS ), STAT=IOS )
            CALL CHECKMEM( IOS, 'ISLTYP', PROGNAME )
        END IF

was modified to (lines 524 - 528 in updated tmpbeis4.f):

 IF (PX_VERSION) THEN ! line 480
....
            ALLOCATE( SOILM( NCOLS, NROWS ),
     &                SOILT( NCOLS, NROWS ),
     &               SOILT2( NCOLS, NROWS ),
     &               ISLTYP( NCOLS, NROWS ), STAT=IOS )
            CALL CHECKMEM( IOS, 'SOILM...ISLTYP', PROGNAME )
  1. Re-arrange loop structure for better execution efficiency. For example, lines 1030 in original tmpbeis4.f:
    
                    DO I = 1, NCOLS
                        DO J = 1, NROWS

C............................. If switch equal to 0 use winter normalized emissions IF( SWITCH_FILE ) THEN IF( SWITCH( I,J ) == 0 ) THEN SEMIS( I, J, 1:NSEF ) = & AVGEMIS( I, J, 1:NSEF , NWINTER ) .........

was modified to (~ line 1048 in updated tmpbeis4.f. Note how I and J loop was switched, and also a reminder that Fortran is column-major):
                    DO J = 1, NROWS
                    DO I = 1, NCOLS

                        IF( SWITCH( I,J ) == 0 ) THEN
                            SEMIS( I, J, 1:NSEF   ) =
 &                              AVGEMIS( I, J, 1:NSEF  , NWINTER )
                                 .........

3.  Check for failure when getting environment-variable (e.g., ENVINT). For example, the following check was added for getting environment-variable 'OUTZONE' (line 247 in original tmpbeis4.f)
    TZONE = ENVINT( 'OUTZONE', 'Output time zone', 0, IOS )
    IF ( IOS .GT. 0 ) THEN
        CALL M3EXIT( PROGNAME,0,0, 'Bad env vble "OUTZONE"', 2 )
    END IF

4. Introduction of `USE M3UTILIO` statement in place of using INCLUDE IOAPI's include file (e.g., PARMS3.EXT, FDESC3.EXT, IODECL3.EXT) which would simplify downstream variable declarations and cross-module dependency.   

5. Carlie also added a code block for unit conversion from mole/hr to mole/s (~ lines 1001 - 1010 in updated tmpbeis4.f). This could be a typo since this unit conversion was taken care of elsewhere in later section of tmpbeis4. Furthermore, it is more efficient to just make `MLFAC = MLFAC * HR2SEC` rather than putting `MLFAC` in double loops.

C............ Convert to moles/second if necessary

    IF ( UNITTYPE .EQ. 2 ) THEN
        DO L = 1, MSPCS
        DO K = 1, NSEF
            MLFAC( L, K ) = HR2SEC * MLFAC( L, K )
        END DO
        END DO
    END IF
hnqtran commented 5 months ago

Testing of tmpbeis4 with and without update, surprisingly, did not show improvement in the execution time. Note that the test was conducted on a SMOKE training package over LISTOS domain (25 row x 25 col). Observable improvement in execution time could be expected for larger domain.

Using m3diff tool to compare emis_mole* output files initially showed significantly lower emissions in the output files with updated tmpbeis4. This was later found to be caused by the double unit conversion in the updated tmpbeis4 (item 5 in comment above). After this double unit conversion was removed, differences between the outputs are < 0.1% which are in acceptable range.

eyth commented 5 months ago

Huy, can you consider running this on the full 12US2 or 12US1 domain instead of the 25x25?

On Thu, Jan 11, 2024 at 9:20 AM Huy Tran @.***> wrote:

Testing of tmpbeis4 with and without update, surprisingly, did not show improvement in the execution time. Note that the test was conducted on a SMOKE training package over LISTOS domain (25 row x 25 col). Observable improvement in execution time could be expected for larger domain.

Using m3diff tool to compare emis_mole* output files initially showed significantly lower emissions in the output files with updated tmpbeis4. This was later found to be caused by the double unit conversion in the updated tmpbeis4 (item 5 in comment above). After this double unit conversion was removed, differences between the outputs are < 0.1% which are in acceptable range.

— Reply to this email directly, view it on GitHub https://github.com/CEMPD/SMOKE/issues/84#issuecomment-1887278159, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB26PS4FYKV7YLN3BAGWMXDYN7YJLAVCNFSM6AAAAAA7E6XCOGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBXGI3TQMJVHE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

hnqtran commented 5 months ago

Huy, can you consider running this on the full 12US2 or 12US1 domain instead of the 25x25?

I'm working on setting up test case based on emission platform 2020ha2 for 12US1 domain. Currently having issue with missing variable SOILT2 in the input met file METCRO2D.

hnqtran commented 5 months ago

Performance Test with 2020ha2_cb6_20k emission model platform

Results: Scenarios | Total Run Time | Individual Day Run TIME FULL | | 1st try | 6:18.69 min | Jul-01: 7 s ; Jul-15: 5 s ; Jul-31: 5 s
2nd try | 6:13.15 min | Jul-01: 5 s ; Jul-15: 5 s ; Jul-31: 5 s
SIMP | | 1st try | 6:15.54 min | Jul-01: 5 s ; Jul-15: 6 s ; Jul-31: 5 s
2nd try | 6:18.04 min | Jul-01: 5 s ; Jul-15: 6 s ; Jul-31: 5 s ORIG | | 1st try | 8:01.58 min | Jul-01: 10 s ; Jul-15: 9 s ; Jul-31: 7 s
2nd try | 7:30.14 min | Jul-01: 7 s ; Jul-15: 8 s ; Jul-31: 8 s

There is no significant differences in run time between FULL and SIMP, meaning all gained benefit in run time was mainly from the loop re-arrangement. Loop re-arrangement yield about 35% faster in runtime in comparison to ORIG.

Additional information: Modern compiler can transform the code for better efficiency in memory accessing when optimization flag is activated more info here such as -O3 flag which was activated for SMOKE compilation.