Refactor system test scripts to make it easier to add new tests that compare two runs

billsacks commented 8 years ago

Most of our system tests do two runs that differ in some way, then compare them to make sure they are identical. There is a desire - from myself and others - to add many more tests along these lines that tweak small things and ensure that the simulation isn't affected (e.g., turn carbon isotopes on in CLM and ensure that this doesn't change the evolution of the bulk carbon pools).

The new python-based system tests are a big step forward from the old shell scripts - thank you to everyone who worked on that! However, to make the test scripts more maintainable, and to make it easier to add new system tests, I'd suggest even more aggressive removal of duplication between these tests. Here I'm using the LII test as an example.

There are three main things I'd like to see:

Introduce a base class for doing two runs
Introduce a pre-build stage, so I can move the namelist setup out of the build stage

The main point would be to avoid needing to check if things have already been done and avoid doing them again. This would also make the whole test flow logic more clear.
Move this to some common place:

        self._case.set_value("CONTINUE_RUN",False)
        self._case.set_value("REST_OPTION","none")
        self._case.set_value("HIST_OPTION","$STOP_OPTION")
        self._case.set_value("HIST_N","$STOP_N")

I envision something like the following prototype (which is probably buggy, but hopefully conveys what I envision well enough):

class SystemTestsCompareTwo(SystemTestsCommon):

    def __init__(self, case):
        SystemTestsCommon.__init__(self, case)
        self._test_suffix = 'test'  # will typically be overridden in derived classes

    def run(self):
        self._setup_first_phase()
        success = SystemTestsCommon._run(self)

        if success:
            self._setup_second_phase()
            success = SystemTestsCommon._run(self, self._test_suffix)

        if success:
            success = self._component_compare_test("base", self._test_suffix)

        return success

    def _setup_first_phase(self):
        pass

    def _setup_second_phase(self):
        pass

class LII(SystemTestsCompareTwo):
    def __init__(self, case):
        SystemTestsCompareTwo.__init__(self, case)
        self._test_suffix = 'init_interp_on'

    # This is a new method that does not currently exist in
    # SystemTestsCommon. For many tests it won't do anything, so the default
    # implementation in SystemTestsCommon will just be 'pass'. However, it can
    # be overridden to do something useful in tests where something is needed
    # prior to the build. The advantages of having this in a separate stage
    # rather than as part of the build are:
    #
    # (a) This is just called once, right after test creation, even if the build
    # is called multiple times - so you don't need logic in here ensuring that
    # it's only done once
    #
    # (b) For tests that need this setup but don't need to override the standard
    # build method, the method signature can remain simpler, and you don't need
    # to explicitly call SystemTestsCommon.build(...)
    #
    # (c) Separating things into when they actually need to be done yields
    # better communication about the actual process of setting up the test
    #
    # I haven't looked through the higher-level scripts to determine where this
    # will be called from.
    def initial_setup(self):
        # Note: If copying user_nl files and making some small tweaks to them
        # for the two different runs is a common operation, then this operation
        # could be moved into the SystemTestsCompareTwo class, and called from
        # each derived class that needs it.

        os.makedirs("user_nl_nointerp")
        for filename in glob.glob(r'user_nl_clm*'):
            shutil.copy(filename, os.path.join("user_nl_nointerp",filename))
            with open(os.path.join("user_nl_nointerp",filename), "a") as newfile:
                newfile.write("use_init_interp = .false.")

        os.makedirs("user_nl_interp")
        for filename in glob.glob(r'user_nl_clm*'):
            shutil.copy(filename, os.path.join("user_nl_interp",filename))
            with open(os.path.join("user_nl_interp",filename), "a") as newfile:
                newfile.write("use_init_interp = .false.")

    def _setup_first_phase(self):
        caseroot = self._case.get_value("CASEROOT")
        for filename in glob.glob(r'user_nl_nointerp/*'):
            shutil.copy(filename, os.path.join(caseroot,filename))

    def _setup_second_phase(self):
        caseroot = self._case.get_value("CASEROOT")
        for filename in glob.glob(r'user_nl_interp/*'):
            shutil.copy(filename, os.path.join(caseroot,filename))

The intent here is to make the specific test class (LII above) contain only what is unique to that test. Once that's the case, it will become much easier to create and maintain a wide variety of tests.

jedwards4b commented 8 years ago

A lot of what you suggest has already been done. For example look at the ers.py and err.py

billsacks commented 8 years ago

@jedwards4b : I agree that the new python scripts are a big step forward from the shell scripts in this respect! And I did look at ers.py before writing this issue. What I'm suggesting amounts to minor tweaks on top of what is already in place to remove the remaining duplication. I think that this is especially important because the (small) bits of duplication are somewhat cryptic to someone who doesn't understand the whole set of testing scripts, which means that people (myself included) will inevitably copy and paste this without really understanding it... which in my experience leads to bugs and maintainability problems. There is also a greater chance that a critical line will accidentally be left out.

Specifically, I am suggesting moving the run method:

    def run(self):
        success = self._ers_first_phase()

        if success:
            return self._ers_second_phase()
        else:
            return False

as well as this logic (in _ers_second_phase):

        # Compare restart file
        if success:
            return self._component_compare_test("base", "rest")
        else:
            return False

into a shared location, so it doesn't need to be duplicated for each test.

I believe that this further refactoring would have reduced the chances of introducing one of the LII bugs documented in issue #291 , as well as the ERP bug documented in issue #295.

billsacks commented 8 years ago

This issue is a revamped version of https://github.com/CESM-Development/cime/issues/146, taking into account the new (and much easier to work with!) python-based system tests. That older issue had some additional ideas; the main ones were:

Allow a test to specify two testmods directories. Then the test system does all the work of setting up and running cases with each of those testmods, and comparing them for equality.
Allow tests to have a custom comparison script, so that rather than comparing for equality, you can say something like, "if irrigation is turned on in CLM, global latent heat flux should be higher" (to confirm that irrigation is doing something that is at least correct to first (zeroth?) order).

I feel that those additional things would be nice to have, but not essential.

ESMCI / cime

Refactor system test scripts to make it easier to add new tests that compare two runs #292