Open billsacks opened 8 years ago
@billsacks the handling of COMPARE phases in CIME is already awkward and having another phase like COMPARE is not something I'm excited about. If you can describe a specific need/capability that would require this change, would you add a comment hear describing it? I'm going to tentatively close this for now. Feel free to reopen if you think we should reconsider.
We do need to distinguish between "compare with baseline fail" and "compare within a test fail". The latter is much more serious. In fact, the test should just stop and not bother with baseline compare if the primary test compare failed.
@rljacob I have a different PR going in soon that will mark a test with an "internal" diff (non-baseline) as FAIL instead of DIFF.
I agree with @rljacob here: These phases need to be more clearly distinguished. I'm going to reopen this issue. Yes, they both do "comparisons", but they are doing fundamentally different things, and I feel it's important to clearly distinguish between them in both the code and the test reporting, rather than treating them as flavors of the same thing. Distinguishing between these is just as important as distinguishing between (say) a RUN failure and a BUILD failure.
Maybe I'm missing something, though: I don't understand what difficulty it would cause to have two clearly-distinguished phases, something like BASELINE_COMPARE and TEST_COMPARE.
If you have too much on your plate and don't feel this is at a high level of priority, I can understand that. But I would like to keep this issue open until it is resolved.
@billsacks I just realized that a test has at most one baseline compare phase. That makes this easier than I thought.
Reopening this because I still don't think there is enough distinction between test comparison and baseline comparisons in the TestLog output.
Example: A recent PR caused a PET test to fail in our nightly testing. It was reported as FAIL instead of diff on the dashboard, which is good, but looking at the TestStatus.log which is posted to our cdash board is still a mess. https://my.cdash.org/testDetails.php?test=48339687&build=1673112
As the PR author noted "this looks like a normal baseline diff". And he's right. Here's an abbreviated summary from the log: 2019-07-04 03:19:35: NLCOMP (compare with baselines) 2019-07-04 05:55:00: Comparing hists (compare first run with baselines) FAIL 2019-07-04 05:56:52: Comparing hists (the "base" and "single threaded" compare) FAIL 2019-07-04 05:56:59: Comparing hists (compare second run with baselines) FAIL
The NLCOMP and 2 baseline compares should not be done at all if the PET test itself fails. If the test bases, the baselines compares should should all appear AFTER the PET compares in the log.
Also the 2 hist compares with baselines are, I assume, comparing each of the two runs in the PET test with their corresponding baselines but that also isn't clear in the output.
@rljacob part of the mess you're seeing may be due to #2916 . I believe the intent is to only do a single baseline comparison at the end, but there's a bug (in multisubmit tests?) that causes this to be done twice.
That said, I won't argue with the general request for additional cleanup here - I just want to make sure that a distinction is made between general cleanup that's needed for all tests, and problems that are really due to #2916 .
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue was closed because it has been stalled for 5 days with no activity.
As someone who looks at TestStatus.log output a lot (on our cdash page), would still like this.
Inspired by @gold2718 's comment here:
https://github.com/ESMCI/cime/pull/392#issuecomment-239626725
cime4 test reporting made a clear distinction between these, as:
and I feel it would be good to maintain that clear distinction in cime5.
@jgfouca made some steps towards this, by adding suffixes to the single 'COMPARE' phase. But it seems like it would be good to explicitly split this into two phases - a TEST_COMPARE phase and a BASELINE_COMPARE phase - both in the code and in the test reporting. In the test reporting, this will make it easier to do greps to include / exclude particular aspects of the tests. In the code, this will make it easier to add new functionality, such as the functionality proposed by @gold2718 .