CICE-Consortium / CICE

Development repository for the CICE sea-ice model
Other
58 stars 132 forks source link

Posting test results #71

Closed eclare108213 closed 6 years ago

eclare108213 commented 6 years ago

CDash or github wiki? Let's decide. Examples: http://my.cdash.org/index.php?project=myCICE http://my.cdash.org/index.php?project=myIcepack https://github.com/CICE-Consortium/Icepack/wiki/ff9ef4a8957a79620f96600f3e70212c0f6ad0ee See also @apcraig PR https://github.com/CICE-Consortium/Icepack/pull/94

We need to write down requirements and how they can be addressed, pros and cons, etc -- essentially a design document for posting test results. Some issues that have already come up in discussions (random order):

eclare108213 commented 6 years ago

Capturing some info from emails: @dabail10 CESM uses an in-house developed test database. Alice Bertini wrote scripts which could be hacked to write rst for posting on github

@apcraig ESMF hosts a website on which they post results that have been sent via mail and are converted to html

@eclare108213 ACME/E3SM only tracks nightly regression tests, using CDash, not collecting results

eclare108213 commented 6 years ago

My thoughts on the test reporting to wiki pages. Some suggestions are taken from the conversation on CICE-Consortium/Icepack#94 and stuff that Tony has already done. I'm trying to get it all in one place.

I suggest that we create a separate repo for test results, rather than having them hanging on our code repositories. This is mainly to allow the permissions to be opened up so more people can write their results to the wiki; only a few people have write permission to our code repo wikis. I think we need a test repo for Icepack and a separate one for CICE, but they could be combined.

In Tony's initial sample, the main test results wiki page https://github.com/CICE-Consortium/Icepack/wiki/Test-Results links to new pages, which contain the stoplight chart for the test. The main page needs summary information on it, so that we don't have to look at all of the stoplight pages if the tests are running well. Suggestions: date (time zone), version number (how do we do this?), branch name, hashtag, summary information for tests.

The stoplight chart pages should include all of that information at the top, plus machine/OS, compiler, user. Columns need to include build, run, compare, restart. Can we record which hashtag we're comparing with for the regression tests?

We also need to include deltas for timings. Let's do the TimeLoop timer, at least, and if it's easy then also the Dynamics and Column timers. At the moment, timers are only available for CICE, not Icepack. If certain (TBD) test timings are different by more than, say, 10%, that should be flagged in the summary info. We could develop statistics that take into account typical machine variation...

I would put the test name on the far right-hand-side of the stoplight chart, since we generally won't need to look at it unless a test fails. Better to be able to see the colors at a glance, and only scroll sideways for more info as necessary.

Questions: Separate or combined repos for CICE and Icepack test results? How do we assign version numbers, for compilation of test results? How do we compile test results across multiple hashtags, machines, etc? At what point do we remove results from the main page (they will still be in the repo, in earlier versions)?

apcraig commented 6 years ago

I agree that a separate test results page makes sense for several reasons and also agree with the overall suggestions proposed so far. With regard to questions, I suggest we have CICE and Icepack in the same repo but separated at a top level. That way we can see both results quickly without having to change repos.

I will have to think about version numbers, that's a real challenge. One thing we could do is have a file in icepack and in cice, like a README or a ChangeLog or just a file with a version number in it. It would be up to each developer to update that file before a PR is pushed. We could then grab that string out of the file to identify the tag. That is what we're doing with CESM. The other idea is to add a command line argument to 'create.case -ts' that would require the user running the test suite to specify the -version cice6.0.2.10. But maybe there are other ideas?

The current implementation aggregates test results across multiple machines for a hashtag. So whatever identifier we have to separate pages, maybe a hash or maybe a version number, we can aggregate results that way. I don't think we'll be able to display arbitrarily, that would take a database. In other words, we can aggregate results by machine, by hash, or by whatever we like, but we cannot reformat to other "slices" on the fly.

Lets see how this goes and then decide what our "clean up" strategy should be. For now, lets just accumulate. If we can automate the clean up, that's great. Otherwise it might be something that we do periodically and manually. It's probably 10 minutes of work to manually update the wiki and remove old results.

eclare108213 commented 6 years ago

Currently MPAS-seaice adds a file with the (svn) revision number of its version of the column package. I like the idea of having a file with the version number, just so that it makes it trivial to identify which version of the code someone has, if the code has gotten separated from the rest of the repo info. E.g. the copyright file in CICE v5 and earlier is in drivers/cice, but that hasn't always been propagated for cesm and hadgem, for instance. If we have to change it manually, we'll need a reminder to do it (not to mention guidance for when and how). For Icepack, it would obviously go in the directory with the F90 modules. For CICE, there are multiple such directories but it could go in cicecore/.

It would make sense to accumulate test results "across slices" when we increment to a new version number. We still might have results coming in from past versions, though, even a year or more later (e.g. from operational centers), so we have a challenging task to make the test reporting both as flexible and as automated as possible.

apcraig commented 6 years ago

I will try to try to make some progess on an updated prototype based on these requirements/suggestions. We could potentially save the results by version and by machine on two different pages of test results. It would mean duplication of results, but could be handy. I will try to do that and see how it works.

Another idea I might try is to use the repo rather than the wiki to post results. If the test results are going to be in a separate repo, we actually could use the repo to save/view the results. There are several formats that should work and it should work like or better than the wiki. I will try that first and see how it goes.

apcraig commented 6 years ago

There is a new repo, https://github.com/CICE-Consortium/Test-Results, and I have updated the scripts in icepack, https://github.com/CICE-Consortium/Icepack/pull/94. I have tried to address most suggestions, adding an extra layer to provide a summary and cutting the data a few ways. ONLY icepack results are working right now. The -report option is now pushing the results to the github wiki page, but they can also be manually pushed by executing report_results.csh in a base_suite directory. We should continue to refine the reporting implementation and we can change many things (but not everything). For the automation to fully work, you will need to add

[credential "https://github.com/CICE-Consortium/Test-Results.wiki.git"] helper = store

to your ~/.gitconfig file on machines. Feel free to test all this if you like. Something like

./icepack.create.case -ts base_suite -testid tr115 -bg icepack.tr115 -bc icepack.tr101 -m conrad -report

should activate it automatically.

duvivier commented 6 years ago

Tony, Elizabeth,

I agree with what you've done to make the separate Repo that can be more open to others. I also think the wiki page Tony designed there looks really good. I think that it makes sense to have test options by hash, version, and machine so that folks can easily see what's relevant to them. As far as I understand, it's just a matter of the same test results being reported all three places, right?

Then only change I might suggest is a legend or description of the colors. As far as I can tell, yellow means that some tests passed and some failed, correct? It looks like when you click on a link we get green colors and red colors for pass/fail, but the yellow is a bit confusing in my opinion if you're expecting a Yes/No answer.

Finally, it appears the links for pass, fail, and total tests all lead to the same table. Is this correct? Is this what we want? I'd think maybe clicking on pass takes you to a list of the passed tests while fail takes you to the fail. If this isn't possible, which is fine, maybe just have the "total" column link to the table?

eclare108213 commented 6 years ago

Yes, a legend for the colors would be helpful, and I think it would be better to not have multiple links to the same table (pass fail total). I'd also rather there be one table, not different ones for pass, fail etc. I'd also prefer there just be one color square. Could that be put in its own column, maybe with a "details" clickable link to the table, and just have the numbers of pass, fail etc in the other columns?

If one of us wanted to include some sort of commentary on a particular wiki page for one of the tests, would that cause problems, or get written over? It looks like the tests continue to accumulate once the page is initially created. The commentary might be that the cause of the failures is known and is being addressed in issue #N and PR #M... Could include a "Comments" line when the page is first created, which could be filled in later, or not.

apcraig commented 6 years ago

I will take a shot at updating the wiki based on feedback. Lots of good ideas there, keep them coming. I don't know if we'll be able to easily add commentary. I guess I could add a column for comments and then leave it blank by default. To update, one would either have to edit the wiki page directly or checkout the wiki, edit, and push back. The wiki table is not that pretty to edit manually. We could just leave a space at the top of the table that allows users to edit the wiki table and add comments. I think that would be better for the table and easier on the user. It still involves manually editing the wiki which I suspect we won't do too often. Rather than focus on comments on the results page, we should be opening issues if there are problems. Those issues can probably point to test results, but mostly that would make the problems a lot more visible and it would allow the problems to span multiple machines, commits, and so forth. A comment added to a single test page quickly gets lost in my experience.

eclare108213 commented 6 years ago

I was thinking the comment could be in the 'general info' at the top of the page, not in the table itself. But perhaps the results page isn't the right place.

apcraig commented 6 years ago

OK, I updated things. https://github.com/CICE-Consortium/Test-Results/wiki. I have split the consortium results and "other forks". There is a legend page on the sidebar. I have added a new cut of the data, "by branch". I have also updated the linking on the overview page and the color, so the color and link is only in the "total" column. We could put the color in a separate column but this saves a little space. And again, there is only data for "Other Forks" and Icepack right now. As soon as we are happy, and execute the PR, I can start work on the CICE equivalent (it should not take long). Happy to continue to iterate on the format though and we can change it in the future once we see what works and what doesn't. Personally, I think "by version, by machine, by hash, and by branch" is too many similar cuts. But we can see which ones we really do end up using and then we can remove/add other "cuts" of the data as needed.

duvivier commented 6 years ago

see comment for PR #94 from Icepack repo.

apcraig commented 6 years ago

One followup. It turns out the Test-Results wiki can be edited by anyone. But if you clone and want to push, then that is limited to collaborators. There is no difference between these two things, it's just that the process is different. It's unfortunate the permissions are different. I have done some googling, there is no easy way to set permissions to "all" for push permission to the wiki.

So we have to decide what we want. If we want to limit push permission, which maybe isn't a horrible thing, we can just keep adding collaborators as the group of developers expands. It's easy to do that quickly, just on the Test-Results page. If we truly want to grant the world push permission, then we could try setting up and using a shared github user name, like CTestResults and to expose that username and password in our scripts. So when a user goes to clone/push the wiki in our scripts, it automatically uses that username/password. That username/password would obvioiusly be exposed, but that's probably OK.

I think my preference is to limit push permissions for now to a growing list of collaborators and if that becomes too difficult, we can try something else. The list of Test-Results collaborators are managed separate than CICE and Icepack, so that's probably good. My feeling is that, in practice, there won't be a huge number of people running test suites that also want to publish the results on the wiki. A lot of that will be administrative testing, independent pre-PR testing, and internal testing. But I'm open. We had originally said we want something completely open. If that's still the case, we can probably make that happen with a bit of work.

eclare108213 commented 6 years ago

I agree with this. Let's add collaborators for now, and then we can change it later if we need to. Also, since we are limiting for 'push', let's also limit direct editing of the wiki in the same way, for consistency.

apcraig commented 6 years ago

Sounds good. For now, I have turned off "open" editing and have added about 10 of us as collaborators.

eclare108213 commented 6 years ago

@apcraig can we consider this issue complete? There may be things still to tweak about how the results are reported, but we could make new issues for those, as needed.

apcraig commented 6 years ago

I agree we can close this issue. There may be more changes regarding the results presentation, but what we have now seems to be working.