MPAS-Dev / MPAS

Repository for private MPAS development prior to the MPAS v6.0 release.
Other
4 stars 0 forks source link

Improves LIGHT compass testing #1470

Closed pwolfram closed 6 years ago

pwolfram commented 6 years ago

Improves existing LIGHT testing suite, including speed up of existing test cases and incorporation of validation of LIGHT results against baseline results.

mark-petersen commented 6 years ago

Can you rebase on the head of ocean/develop? Also, please squash all the commits into one, unless you have some particular reason to keep them separate. It just makes it a little easier to handle. For example, we will probably cherry-pick these changes onto David's branch.

mark-petersen commented 6 years ago

Please test your new LIGHT regression on grizzly or wolf.

pwolfram commented 6 years ago

@mark-petersen, I recommend we get #1464 before merging in this PR, just in case there is a bug in the future so that we don't conflate issues if we have to go back to do a bisection test.

pwolfram commented 6 years ago

@mark-petersen, FYI, I force-pushed to ignore the broken commit (prior to merge of #1464) as we discussed this morning and will incorporate these other changes and let you know when I'm finished.

pwolfram commented 6 years ago

Testing for LIGHT.xml regression:

e.g., ./manage_regression_suite.py -s -c -t ocean/regression_suites/LIGHT.xml --work_dir=tmp_light -f general.config.ocean.wf

macOS Sierra:

└─▪ ./nightly_ocean_test_suite.py 
 ** Running case Periodic Planar 20km - LIGHT particle general test
      PASS
 ** Running case Periodic Planar 20km - LIGHT particle region reset test
      PASS
 ** Running case Periodic Planar 20km - LIGHT particle time reset test
      PASS
 ** Running case SOMA 32km - LIGHT particle time general test
      PASS
 ** Running case ZISO 20km - LIGHT particle time general test
      PASS
TEST RUNTIMES:
14.72 s Periodic_Planar_20km_-_LIGHT_particle_general_test
13.07 s Periodic_Planar_20km_-_LIGHT_particle_region_reset_test
21.70 s Periodic_Planar_20km_-_LIGHT_particle_time_reset_test
30.11 s SOMA_32km_-_LIGHT_particle_time_general_test
18.53 s ZISO_20km_-_LIGHT_particle_time_general_test

wolf

└─▪ ./nightly_ocean_test_suite.py 
 ** Running case Periodic Planar 20km - LIGHT particle general test
      PASS
 ** Running case Periodic Planar 20km - LIGHT particle region reset test
      PASS
 ** Running case Periodic Planar 20km - LIGHT particle time reset test
      PASS
 ** Running case SOMA 32km - LIGHT particle time general test
      PASS
 ** Running case ZISO 20km - LIGHT particle time general test
      PASS
TEST RUNTIMES:
0:41.87 s SOMA_32km_-_LIGHT_particle_time_general_test
0:19.08 s Periodic_Planar_20km_-_LIGHT_particle_region_reset_test
0:20.25 s Periodic_Planar_20km_-_LIGHT_particle_general_test
0:27.56 s Periodic_Planar_20km_-_LIGHT_particle_time_reset_test
0:36.56 s ZISO_20km_-_LIGHT_particle_time_general_test
pwolfram commented 6 years ago

I'm comfortable merging once #1464 is merged and you are satisfied @mark-petersen.

mark-petersen commented 6 years ago

Rebased on head. Tested on grizzly with gnu. The nightly regression suite works.

mark-petersen commented 6 years ago

A few minor comments.

  1. Here:

    testing_and_setup/compass/ocean/periodic_planar/20km
    gr-fe3:20km$ ls
    default  region_reset_test  time_reset_test

    The two reset directory names should include the word LIGHT to indicate what is reset.

  2. All tests with LIGHT turned on need a validation section. That specifies the variables that are compared with another simulation, using the -b flag on the COMPASS setup_testcase.py or manage_regression_suite.py commands. See, for example:

    cd testing_and_setup/compass/ocean/global_ocean/QU240/analysis_test
    vi config_driver.xml
    
    <validation>
        <compare_fields file1="forward/output.nc">
            <template file="prognostic_comparison.xml" path_base="script_core_dir" path="templates/validations"/>
        </compare_fields>
        <compare_fields file1="forward/analysis_members/globalStats.0001-01-01_00.00.00.nc">
            <field name="kineticEnergyCellMax" l1_norm="0.0" l2_norm="0.0" linf_norm="0.0"/>
            <field name="kineticEnergyCellMin" l1_norm="0.0" l2_norm="0.0" linf_norm="0.0"/>
            <field name="kineticEnergyCellAvg" l1_norm="0.0" l2_norm="0.0" linf_norm="0.0"/>
            <field name="temperatureAvg" l1_norm="0.0" l2_norm="0.0" linf_norm="0.0"/>
            <field name="salinityAvg" l1_norm="0.0" l2_norm="0.0" linf_norm="0.0"/>
        </compare_fields>

    You should make a template file="LIGHT_comparison.xml". You can also add a timer test, like this:

        <compare_timers rundir1="forward">
            <timer name="compute_globalStats"/>
            <timer name="write_globalStats"/>

    Maybe you were planning to do this on a future PR, that is fine too.

  3. I love this output:

    TEST RUNTIMES:
    1:25.82 s sub-ice-shelf_2D_-_restart_test

    Thank you for that. A detail, if convenient: we don't need tenths of seconds, and 's' is confusing. How about:

    TEST RUNTIMES in min:sec
    1:25 sub-ice-shelf_2D_-_restart_test

    that would require another split on the period (.) Also, is it possible on this line:

    for outputname in os.walk(base_path + case_output):

    to list alphabetically? Don't bother if it is a trouble. I'm learning python from you here. Another option is to list the time after each test, then have the summary at the end.

    ** Running case Global Ocean 240km - Performance Test
      PASS
      duration: 1:25
mark-petersen commented 6 years ago

I added the two reset cases back in the nightly test suite. On Grizzly, they are still too slow:

3:09.50 s Periodic_Planar_20km_-_LIGHT_particle_region_reset_test
6:47.81 s Periodic_Planar_20km_-_LIGHT_particle_time_reset_test

Strange that it is so much faster on your laptop. Here is the whole thing:

TEST RUNTIMES:
1:25.82 s sub-ice-shelf_2D_-_restart_test
1:10.98 s Global_Ocean_240km_-_RK4_Blocks_Test
0:16.46 s Baroclinic_Channel_10km_-_Restart_Test
0:17.08 s Baroclinic_Channel_10km_-_Thread_Test
0:28.68 s ZISO_20km_-_Smoke_Test_with_frazil
0:50.94 s Global_Ocean_240km_-_Performance_Test
1:33.00 s Global_Ocean_240km_-_Analysis_Test
1:04.64 s Global_Ocean_240km_-_SE_Blocks_Test
1:52.41 s Global_Ocean_240km_-_Smoke_Test_with_land_ice
0:34.79 s ZISO_20km_-_Smoke_Test
0:16.57 s Baroclinic_Channel_10km_-_Decomp_Test
3:09.50 s Periodic_Planar_20km_-_LIGHT_particle_region_reset_test
1:06.15 s Global_Ocean_240km_-_Restart_Test
6:47.81 s Periodic_Planar_20km_-_LIGHT_particle_time_reset_test
mark-petersen commented 6 years ago

A little note for next time: We typically name our branches with a prefix

ocean/improve_LIGHT_testing
framework/some_change

I guess that is redundant with the PR tags, but it has been our convention.

mark-petersen commented 6 years ago

To check if the comparison and -b (same as --baseline_dir) works, run COMPASS once, then again with -b as follows:

./setup_testcase.py -f general.config.ocean_turq --work_dir /lustre/scratch3/turquoise/mpeterse/runs/DIRECTORY_1 -n 10
./setup_testcase.py -f general.config.ocean_turq --work_dir /lustre/scratch3/turquoise/mpeterse/runs/DIRECTORY_2 -n 10 -b /lustre/scratch3/turquoise/mpeterse/runs/DIRECTORY_1

and -n on one of your new test cases. Go into the second directory, and you should see blocks like this:

cd
/lustre/scratch3/turquoise/mpeterse/runs/t54u/ocean/global_ocean/QU240/performance_test
vi run_test.py

os.chdir(base_path)
try:
    subprocess.check_call(['/turquoise/usr/projects/climate/mpeterse/repos/MPAS/merge_bill_exchange_reuse_ocean_core/testing_and_setup/compass/utility_scripts/compare_fields.py', '-q', '-1', 'forward/output.nc', '-2', '/lustre/scratch3/turquoise/mpeterse/runs/t53r/ocean/global_ocean/QU240/performance_test/forward/output.nc', '-v', 'temperature', '--l1', '0.0', '--l2', '0.0', '--linf', '0.0'], env=os.environ.copy())
    print ' ** PASS Comparison of temperature between forward/output.nc and /lustre/scratch3/turquoise/mpeterse/runs/t53r/ocean/global_ocean/QU240/performance_test/forward/output.nc'
except:
    print ' ** FAIL Comparison of temperature between forward/output.nc and /lustre/scratch3/turquoise/mpeterse/runs/t53r/ocean/global_ocean/QU240/performance_test/forward/output.nc'
    error = True

for every variable you compare. Variable above is temperature.

pwolfram commented 6 years ago

Testing

Setup

./manage_regression_suite.py -t ocean/regression_suites/light.xml -s -c --work_dir=tmp_light -f general.config.ocean.wf -v produces something like

WARNING: No model runtime specified. Using the default of /lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/runtime_definitions/mpirun.xml
Cleaning Test Cases:
   -- Cleaned case 'Periodic Planar 20km - LIGHT particle general test': -o ocean -c periodic_planar -r 20km -t default_light
   -- Cleaned case 'Periodic Planar 20km - LIGHT particle region reset test': -o ocean -c periodic_planar -r 20km -t region_reset_light_test
   -- Cleaned case 'Periodic Planar 20km - LIGHT particle time reset test': -o ocean -c periodic_planar -r 20km -t time_reset_light_test
   -- Cleaned case 'SOMA 32km - LIGHT particle time general test': -o ocean -c soma -r 32km -t default
   -- Cleaned case 'ZISO 20km - LIGHT particle time general test': -o ocean -c ziso -r 20km -t default

Setting Up Test Cases:
     Script setup outputs to /lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/tmp_light/manage_regression_suite.py.out
   -- Setup case 'Periodic Planar 20km - LIGHT particle general test': -o ocean -c periodic_planar -r 20km -t default_light
     Script setup outputs to /lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/tmp_light/manage_regression_suite.py.out
   -- Setup case 'Periodic Planar 20km - LIGHT particle region reset test': -o ocean -c periodic_planar -r 20km -t region_reset_light_test
     Script setup outputs to /lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/tmp_light/manage_regression_suite.py.out
   -- Setup case 'Periodic Planar 20km - LIGHT particle time reset test': -o ocean -c periodic_planar -r 20km -t time_reset_light_test
     Script setup outputs to /lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/tmp_light/manage_regression_suite.py.out
   -- Setup case 'SOMA 32km - LIGHT particle time general test': -o ocean -c soma -r 32km -t default
     Script setup outputs to /lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/tmp_light/manage_regression_suite.py.out
   -- Setup case 'ZISO 20km - LIGHT particle time general test': -o ocean -c ziso -r 20km -t default

 Summary of test cases:
      Maximum MPI tasks used: 6
      Maximum OpenMP threads used: 1
      Maximum Total Cores used: 6

Case setup output:

 -- Set up case: /lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/tmp_light/ocean/periodic_planar/20km/default_light/init_step2
 -- Set up case: /lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/tmp_light/ocean/periodic_planar/20km/default_light/forward
 -- Set up driver script in /lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/tmp_light/ocean/periodic_planar/20km/default_light
 -- Set up case: /lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/tmp_light/ocean/periodic_planar/20km/default_light/init_step1
 -- Set up case: /lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/tmp_light/ocean/periodic_planar/20km/region_reset_light_test/init_step2
 -- Set up case: /lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/tmp_light/ocean/periodic_planar/20km/region_reset_light_test/forward
 -- Set up driver script in /lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/tmp_light/ocean/periodic_planar/20km/region_reset_light_test
 -- Set up case: /lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/tmp_light/ocean/periodic_planar/20km/region_reset_light_test/init_step1
 -- Set up case: /lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/tmp_light/ocean/periodic_planar/20km/time_reset_light_test/init_step2
 -- Set up case: /lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/tmp_light/ocean/periodic_planar/20km/time_reset_light_test/forward
 -- Set up driver script in /lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/tmp_light/ocean/periodic_planar/20km/time_reset_light_test
 -- Set up case: /lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/tmp_light/ocean/periodic_planar/20km/time_reset_light_test/init_step1
 -- Set up case: /lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/tmp_light/ocean/soma/32km/default/init_step2
 -- Set up case: /lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/tmp_light/ocean/soma/32km/default/forward
 -- Set up driver script in /lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/tmp_light/ocean/soma/32km/default
 -- Set up case: /lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/tmp_light/ocean/soma/32km/default/init_step1
 -- Set up case: /lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/tmp_light/ocean/ziso/20km/default/init_step2
 -- Set up case: /lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/tmp_light/ocean/ziso/20km/default/forward
 -- Set up driver script in /lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/tmp_light/ocean/ziso/20km/default
 -- Set up case: /lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/tmp_light/ocean/ziso/20km/default/init_step1

macOS

└─▪ ./nightly_ocean_test_suite.py 
 ** Running case Periodic Planar 20km - LIGHT particle general test
      PASS
 ** Running case Periodic Planar 20km - LIGHT particle region reset test
      PASS
 ** Running case Periodic Planar 20km - LIGHT particle time reset test
      PASS
 ** Running case SOMA 32km - LIGHT particle time general test
      PASS
 ** Running case ZISO 20km - LIGHT particle time general test
      PASS
TEST RUNTIMES:
0:15 Periodic_Planar_20km_-_LIGHT_particle_general_test
0:14 Periodic_Planar_20km_-_LIGHT_particle_region_reset_test
0:22 Periodic_Planar_20km_-_LIGHT_particle_time_reset_test
0:31 SOMA_32km_-_LIGHT_particle_time_general_test
0:19 ZISO_20km_-_LIGHT_particle_time_general_test
Total runtime 1:41

wolf

┌─[pwolfram][wf400][/lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/tmp_light][09:27][±][improve_LIGHT_testing ✗]
└─▪ ./nightly_ocean_test_suite.py 
 ** Running case Periodic Planar 20km - LIGHT particle general test
      PASS
 ** Running case Periodic Planar 20km - LIGHT particle region reset test
      PASS
 ** Running case Periodic Planar 20km - LIGHT particle time reset test
      PASS
 ** Running case SOMA 32km - LIGHT particle time general test
      PASS
 ** Running case ZISO 20km - LIGHT particle time general test
      PASS
TEST RUNTIMES:
0:24 Periodic_Planar_20km_-_LIGHT_particle_general_test
0:22 Periodic_Planar_20km_-_LIGHT_particle_region_reset_test
0:36 Periodic_Planar_20km_-_LIGHT_particle_time_reset_test
0:46 SOMA_32km_-_LIGHT_particle_time_general_test
0:27 ZISO_20km_-_LIGHT_particle_time_general_test
Total runtime 2:35

grizzly

┌─[pwolfram][gr0530][/lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/tmp_light_gr][09:41][±][improve_LIGHT_testing ✗]
└─▪ ./nightly_ocean_test_suite.py 
 ** Running case Periodic Planar 20km - LIGHT particle general test
      PASS
 ** Running case Periodic Planar 20km - LIGHT particle region reset test
      PASS
 ** Running case Periodic Planar 20km - LIGHT particle time reset test
      PASS
 ** Running case SOMA 32km - LIGHT particle time general test
      PASS
 ** Running case ZISO 20km - LIGHT particle time general test
      PASS
TEST RUNTIMES:
0:17 Periodic_Planar_20km_-_LIGHT_particle_general_test
0:15 Periodic_Planar_20km_-_LIGHT_particle_region_reset_test
0:26 Periodic_Planar_20km_-_LIGHT_particle_time_reset_test
0:36 SOMA_32km_-_LIGHT_particle_time_general_test
0:21 ZISO_20km_-_LIGHT_particle_time_general_test
Total runtime 1:55
pwolfram commented 6 years ago

@mark-petersen, I don't know why your baseline tests are so slow but this is inconsistent with my testing across three platforms. All the LIGHT tests should take about 2 min total. Something seems strange and I think we should try to get to the bottom of this if possible.

I'll run the full baseline and comparison (nightly.xml) from scratch on grizzly and report back in a few.

mark-petersen commented 6 years ago

It's good to see that your tests are fast. Could you try your test on /lustre/scratch3/turquoise? Might be slow i/o.

pwolfram commented 6 years ago

@mark-petersen, here are the results from my end-to-end nightly regression suite tests on grizzly. Note the quick computation ~< 6 min relative to ~20-30min from before. I'm happy to sort out the discrepancy but believe that this PR is now ready to merge since I've squashed commits and addressed concerns above.

┌─[pwolfram][gr0530][/lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass][09:49][±][improve_LIGHT_testing ✗]
└─▪ ./manage_regression_suite.py -t ocean/regression_suites/nightly.xml -s -c --work_dir=tmp_baseline -f general.config.ocean.wf 
WARNING: No model runtime specified. Using the default of /lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/runtime_definitions/mpirun.xml
Cleaning Test Cases:
   -- Cleaned case 'Global Ocean 240km - Performance Test': -o ocean -c global_ocean -r QU240 -t performance_test
   -- Cleaned case 'Global Ocean 240km - Restart Test': -o ocean -c global_ocean -r QU240 -t restart_test
   -- Cleaned case 'Global Ocean 240km - SE Blocks Test': -o ocean -c global_ocean -r QU240 -t se_blocks_test
   -- Cleaned case 'Global Ocean 240km - RK4 Blocks Test': -o ocean -c global_ocean -r QU240 -t rk4_blocks_test
   -- Cleaned case 'Global Ocean 240km - Analysis Test': -o ocean -c global_ocean -r QU240 -t analysis_test
   -- Cleaned case 'ZISO 20km - Smoke Test': -o ocean -c ziso -r 20km -t default
   -- Cleaned case 'ZISO 20km - Smoke Test with frazil': -o ocean -c ziso -r 20km -t with_frazil
   -- Cleaned case 'Baroclinic Channel 10km - Thread Test': -o ocean -c baroclinic_channel -r 10km -t threads_test
   -- Cleaned case 'Baroclinic Channel 10km - Decomp Test': -o ocean -c baroclinic_channel -r 10km -t decomp_test
   -- Cleaned case 'Baroclinic Channel 10km - Restart Test': -o ocean -c baroclinic_channel -r 10km -t restart_test
   -- Cleaned case 'Global Ocean 240km - Smoke Test with land ice': -o ocean -c global_ocean -r QU240 -t with_land_ice
   -- Cleaned case 'sub-ice-shelf 2D - restart test': -o ocean -c sub_ice_shelf_2D -r 5km -t restart_test
   -- Cleaned case 'Periodic Planar 20km - LIGHT particle region reset test': -o ocean -c periodic_planar -r 20km -t region_reset_light_test
   -- Cleaned case 'Periodic Planar 20km - LIGHT particle time reset test': -o ocean -c periodic_planar -r 20km -t time_reset_light_test

Setting Up Test Cases:
   -- Setup case 'Global Ocean 240km - Performance Test': -o ocean -c global_ocean -r QU240 -t performance_test
   -- Setup case 'Global Ocean 240km - Restart Test': -o ocean -c global_ocean -r QU240 -t restart_test
   -- Setup case 'Global Ocean 240km - SE Blocks Test': -o ocean -c global_ocean -r QU240 -t se_blocks_test
   -- Setup case 'Global Ocean 240km - RK4 Blocks Test': -o ocean -c global_ocean -r QU240 -t rk4_blocks_test
   -- Setup case 'Global Ocean 240km - Analysis Test': -o ocean -c global_ocean -r QU240 -t analysis_test
   -- Setup case 'ZISO 20km - Smoke Test': -o ocean -c ziso -r 20km -t default
   -- Setup case 'ZISO 20km - Smoke Test with frazil': -o ocean -c ziso -r 20km -t with_frazil
   -- Setup case 'Baroclinic Channel 10km - Thread Test': -o ocean -c baroclinic_channel -r 10km -t threads_test
   -- Setup case 'Baroclinic Channel 10km - Decomp Test': -o ocean -c baroclinic_channel -r 10km -t decomp_test
   -- Setup case 'Baroclinic Channel 10km - Restart Test': -o ocean -c baroclinic_channel -r 10km -t restart_test
   -- Setup case 'Global Ocean 240km - Smoke Test with land ice': -o ocean -c global_ocean -r QU240 -t with_land_ice
   -- Setup case 'sub-ice-shelf 2D - restart test': -o ocean -c sub_ice_shelf_2D -r 5km -t restart_test
   -- Setup case 'Periodic Planar 20km - LIGHT particle region reset test': -o ocean -c periodic_planar -r 20km -t region_reset_light_test
   -- Setup case 'Periodic Planar 20km - LIGHT particle time reset test': -o ocean -c periodic_planar -r 20km -t time_reset_light_test

 Summary of test cases:
      Maximum MPI tasks used: 16
      Maximum OpenMP threads used: 2
      Maximum Total Cores used: 16
┌─[pwolfram][gr0530][/lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass][09:49][±][improve_LIGHT_testing ✗]
└─▪ cd tmp_baseline/
┌─[pwolfram][gr0530][/lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/tmp_baseline][09:50][±][improve_LIGHT_testing ✗]
└─▪ ./nightly_ocean_test_suite.py 
 ** Running case Global Ocean 240km - Performance Test
      PASS
 ** Running case Global Ocean 240km - Restart Test
      PASS
 ** Running case Global Ocean 240km - SE Blocks Test
      PASS
 ** Running case Global Ocean 240km - RK4 Blocks Test
      PASS
 ** Running case Global Ocean 240km - Analysis Test
      PASS
 ** Running case ZISO 20km - Smoke Test
      PASS
 ** Running case ZISO 20km - Smoke Test with frazil
      PASS
 ** Running case Baroclinic Channel 10km - Thread Test
      PASS
 ** Running case Baroclinic Channel 10km - Decomp Test
      PASS
 ** Running case Baroclinic Channel 10km - Restart Test
      PASS
 ** Running case Global Ocean 240km - Smoke Test with land ice
      PASS
 ** Running case sub-ice-shelf 2D - restart test
      PASS
 ** Running case Periodic Planar 20km - LIGHT particle region reset test
      PASS
 ** Running case Periodic Planar 20km - LIGHT particle time reset test
      PASS
TEST RUNTIMES:
00:05 Baroclinic_Channel_10km_-_Decomp_Test
00:05 Baroclinic_Channel_10km_-_Restart_Test
00:08 Baroclinic_Channel_10km_-_Thread_Test
00:28 Global_Ocean_240km_-_Analysis_Test
00:39 Global_Ocean_240km_-_Performance_Test
00:35 Global_Ocean_240km_-_RK4_Blocks_Test
00:31 Global_Ocean_240km_-_Restart_Test
00:30 Global_Ocean_240km_-_SE_Blocks_Test
00:34 Global_Ocean_240km_-_Smoke_Test_with_land_ice
00:19 Periodic_Planar_20km_-_LIGHT_particle_region_reset_test
00:20 Periodic_Planar_20km_-_LIGHT_particle_time_reset_test
00:19 ZISO_20km_-_Smoke_Test
00:09 ZISO_20km_-_Smoke_Test_with_frazil
00:16 sub-ice-shelf_2D_-_restart_test
Total runtime 04:58

┌─[pwolfram][gr0530][/lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass][10:00][±][improve_LIGHT_testing ✗]
└─▪ ./manage_regression_suite.py -t ocean/regression_suites/nightly.xml -s -c --work_dir=tmp_verify -f general.config.ocean.wf --b ${PWD}/tmp_baseline/
WARNING: No model runtime specified. Using the default of /lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/runtime_definitions/mpirun.xml
Cleaning Test Cases:
   -- Cleaned case 'Global Ocean 240km - Performance Test': -o ocean -c global_ocean -r QU240 -t performance_test
   -- Cleaned case 'Global Ocean 240km - Restart Test': -o ocean -c global_ocean -r QU240 -t restart_test
   -- Cleaned case 'Global Ocean 240km - SE Blocks Test': -o ocean -c global_ocean -r QU240 -t se_blocks_test
   -- Cleaned case 'Global Ocean 240km - RK4 Blocks Test': -o ocean -c global_ocean -r QU240 -t rk4_blocks_test
   -- Cleaned case 'Global Ocean 240km - Analysis Test': -o ocean -c global_ocean -r QU240 -t analysis_test
   -- Cleaned case 'ZISO 20km - Smoke Test': -o ocean -c ziso -r 20km -t default
   -- Cleaned case 'ZISO 20km - Smoke Test with frazil': -o ocean -c ziso -r 20km -t with_frazil
   -- Cleaned case 'Baroclinic Channel 10km - Thread Test': -o ocean -c baroclinic_channel -r 10km -t threads_test
   -- Cleaned case 'Baroclinic Channel 10km - Decomp Test': -o ocean -c baroclinic_channel -r 10km -t decomp_test
   -- Cleaned case 'Baroclinic Channel 10km - Restart Test': -o ocean -c baroclinic_channel -r 10km -t restart_test
   -- Cleaned case 'Global Ocean 240km - Smoke Test with land ice': -o ocean -c global_ocean -r QU240 -t with_land_ice
   -- Cleaned case 'sub-ice-shelf 2D - restart test': -o ocean -c sub_ice_shelf_2D -r 5km -t restart_test
   -- Cleaned case 'Periodic Planar 20km - LIGHT particle region reset test': -o ocean -c periodic_planar -r 20km -t region_reset_light_test
   -- Cleaned case 'Periodic Planar 20km - LIGHT particle time reset test': -o ocean -c periodic_planar -r 20km -t time_reset_light_test

Setting Up Test Cases:
   -- Setup case 'Global Ocean 240km - Performance Test': -o ocean -c global_ocean -r QU240 -t performance_test
   -- Setup case 'Global Ocean 240km - Restart Test': -o ocean -c global_ocean -r QU240 -t restart_test
   -- Setup case 'Global Ocean 240km - SE Blocks Test': -o ocean -c global_ocean -r QU240 -t se_blocks_test
   -- Setup case 'Global Ocean 240km - RK4 Blocks Test': -o ocean -c global_ocean -r QU240 -t rk4_blocks_test
   -- Setup case 'Global Ocean 240km - Analysis Test': -o ocean -c global_ocean -r QU240 -t analysis_test
   -- Setup case 'ZISO 20km - Smoke Test': -o ocean -c ziso -r 20km -t default
   -- Setup case 'ZISO 20km - Smoke Test with frazil': -o ocean -c ziso -r 20km -t with_frazil
   -- Setup case 'Baroclinic Channel 10km - Thread Test': -o ocean -c baroclinic_channel -r 10km -t threads_test
   -- Setup case 'Baroclinic Channel 10km - Decomp Test': -o ocean -c baroclinic_channel -r 10km -t decomp_test
   -- Setup case 'Baroclinic Channel 10km - Restart Test': -o ocean -c baroclinic_channel -r 10km -t restart_test
   -- Setup case 'Global Ocean 240km - Smoke Test with land ice': -o ocean -c global_ocean -r QU240 -t with_land_ice
   -- Setup case 'sub-ice-shelf 2D - restart test': -o ocean -c sub_ice_shelf_2D -r 5km -t restart_test
   -- Setup case 'Periodic Planar 20km - LIGHT particle region reset test': -o ocean -c periodic_planar -r 20km -t region_reset_light_test
   -- Setup case 'Periodic Planar 20km - LIGHT particle time reset test': -o ocean -c periodic_planar -r 20km -t time_reset_light_test

 Summary of test cases:
      Maximum MPI tasks used: 16
      Maximum OpenMP threads used: 2
      Maximum Total Cores used: 16
┌─[pwolfram][gr0530][/lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass][10:00][±][improve_LIGHT_testing ✗]
└─▪ cd tmp_verify/
┌─[pwolfram][gr0530][/lustre/scratch2/turquoise/pwolfram/LIGHT_testing/MPAS_LIGHT_compass/testing_and_setup/compass/tmp_verify][10:00][±][improve_LIGHT_testing ✗]
└─▪ ./nightly_ocean_test_suite.py 
 ** Running case Global Ocean 240km - Performance Test
      PASS
 ** Running case Global Ocean 240km - Restart Test
      PASS
 ** Running case Global Ocean 240km - SE Blocks Test
      PASS
 ** Running case Global Ocean 240km - RK4 Blocks Test
      PASS
 ** Running case Global Ocean 240km - Analysis Test
      PASS
 ** Running case ZISO 20km - Smoke Test
      PASS
 ** Running case ZISO 20km - Smoke Test with frazil
      PASS
 ** Running case Baroclinic Channel 10km - Thread Test
      PASS
 ** Running case Baroclinic Channel 10km - Decomp Test
      PASS
 ** Running case Baroclinic Channel 10km - Restart Test
      PASS
 ** Running case Global Ocean 240km - Smoke Test with land ice
      PASS
 ** Running case sub-ice-shelf 2D - restart test
      PASS
 ** Running case Periodic Planar 20km - LIGHT particle region reset test
      PASS
 ** Running case Periodic Planar 20km - LIGHT particle time reset test
      PASS
TEST RUNTIMES:
00:13 Baroclinic_Channel_10km_-_Decomp_Test
00:08 Baroclinic_Channel_10km_-_Restart_Test
00:07 Baroclinic_Channel_10km_-_Thread_Test
00:41 Global_Ocean_240km_-_Analysis_Test
00:27 Global_Ocean_240km_-_Performance_Test
00:42 Global_Ocean_240km_-_RK4_Blocks_Test
00:38 Global_Ocean_240km_-_Restart_Test
00:31 Global_Ocean_240km_-_SE_Blocks_Test
00:45 Global_Ocean_240km_-_Smoke_Test_with_land_ice
00:22 Periodic_Planar_20km_-_LIGHT_particle_region_reset_test
00:30 Periodic_Planar_20km_-_LIGHT_particle_time_reset_test
00:25 ZISO_20km_-_Smoke_Test
00:14 ZISO_20km_-_Smoke_Test_with_frazil
00:27 sub-ice-shelf_2D_-_restart_test
Total runtime 06:10
pwolfram commented 6 years ago

@mark-petersen, my understanding is that /lustre/scratch3/ is not the best scratch space to still use. I'll have a quick look to see if that is the issue via a fresh retest. Note, the fresh push was to fix a "Comparision" typo.

pwolfram commented 6 years ago

Looks like /lustre/scratch3/ is the problem-- probably should migrate away from that disk space based on these 3X slower timings:

┌─[pwolfram][gr0530][/lustre/scratch3/turquoise/pwolfram/MPAS_test/MPAS/testing_and_setup/compass/tmp_baseline][10:29][±][improve_LIGHT_testing ✗]
└─▪ ./nightly_ocean_test_suite.py 
 ** Running case Global Ocean 240km - Performance Test
      PASS
 ** Running case Global Ocean 240km - Restart Test
      PASS
 ** Running case Global Ocean 240km - SE Blocks Test
      PASS
 ** Running case Global Ocean 240km - RK4 Blocks Test
      PASS
 ** Running case Global Ocean 240km - Analysis Test
      PASS
 ** Running case ZISO 20km - Smoke Test
      PASS
 ** Running case ZISO 20km - Smoke Test with frazil
      PASS
 ** Running case Baroclinic Channel 10km - Thread Test
      PASS
 ** Running case Baroclinic Channel 10km - Decomp Test
      PASS
 ** Running case Baroclinic Channel 10km - Restart Test
      PASS
 ** Running case Global Ocean 240km - Smoke Test with land ice
      PASS
 ** Running case sub-ice-shelf 2D - restart test
      PASS
 ** Running case Periodic Planar 20km - LIGHT particle region reset test
      PASS
 ** Running case Periodic Planar 20km - LIGHT particle time reset test
      PASS
TEST RUNTIMES:
00:27 Baroclinic_Channel_10km_-_Decomp_Test
00:23 Baroclinic_Channel_10km_-_Restart_Test
00:27 Baroclinic_Channel_10km_-_Thread_Test
01:35 Global_Ocean_240km_-_Analysis_Test
00:42 Global_Ocean_240km_-_Performance_Test
01:05 Global_Ocean_240km_-_RK4_Blocks_Test
01:06 Global_Ocean_240km_-_Restart_Test
00:56 Global_Ocean_240km_-_SE_Blocks_Test
01:37 Global_Ocean_240km_-_Smoke_Test_with_land_ice
02:03 Periodic_Planar_20km_-_LIGHT_particle_region_reset_test
03:25 Periodic_Planar_20km_-_LIGHT_particle_time_reset_test
00:48 ZISO_20km_-_Smoke_Test
00:26 ZISO_20km_-_Smoke_Test_with_frazil
01:20 sub-ice-shelf_2D_-_restart_test
Total runtime 16:20
pwolfram commented 6 years ago

@mark-petersen, I think this is now ready to merge following any additional testing you might have left.

pwolfram commented 6 years ago

Note, the /lustre/scratch2 vs /lustre/scratch3 choice may be somewhat of a moving target depending upon others' use, so I'm not sure exactly what to recommend here. I've holistically gotten better results on scratch2 over scratch3 but haven't rigorously tested it.

pwolfram commented 6 years ago

scratch3 also used to be better (from what I understand) until about 6 months to 1 year ago too.

pwolfram commented 6 years ago

@mark-petersen, do you see anything here I need to change before the merge? I can review / merge #1474 after this PR.

xylar commented 6 years ago

@pwolfram and @mark-petersen, the same issue as in #1474 applies here. The COMPASS scripts should not have been modified in ocean/develop. Instead, these changes need to be reverted and made in develop instead. Presumably, this means that all changes in this PR should be reverted and those not in the COMPASS scripts should be re-applied once develop has been merged back to ocean/develop.

xylar commented 6 years ago

1479 is intended to add the script changes to framework