Closed bbye closed 8 years ago
Developer tests passed with the exception of the IOP tests and the PEA (mpi vs mpi-serial). The IOP failed during the base file comparison, but a manual check showed the files were identical. The mpi-serial doesn't run on Edison so I couldn't check it.
Out of curiosity, why doesn't mpi-serial run on Edison? Is any of your new code being tested?
If you give me an example of what it means that "mpi-serial doesn't run on Edison," I can try to change the machine files so that it works.
It can't compile, my guess it's because I'm working off an old master, it fails when compiling pio, I don't really have a good feel for the error, the file ends with:
gmake[2]: Leaving directory /scratch2/scratchdirs/bbye/sharedlibroot.harv03/intel/mpi-serial/nodebug/nothreads/pio' /global/common/edison/usg/cmake/2.8.11.2/bin/cmake -E cmake_progress_report /scratch2/scratchdirs/bbye/sharedlibroot.harv03/intel/mpi-serial/nodebug/nothreads/ pio/CMakeFiles 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 [100%] Built target pio gmake[1]: Leaving directory
/scratch2/scratchdirs/bbye/sharedlibroot.harv03/intel/mpi-serial/nodebug/nothreads/pio'
gmake: *\ [all] Error 2
exit 2
Do you mean if I run the acme_developer test suite on Edison in your branch, I will see the error you have written above?
Yes
This branch was created off of master as of PR #65, which predates the fixes to the Edison machine files. @jedbrown , do you recommend that we attempt to rebase this against a more recent master?
It looks like this is a single commit so it can be rebased without invalidating intermediate testing. Alternatively, you can do a throw-away merge and test that mpi-serial works. I'm not too concerned either way as long as that case is tested in 'next' before graduation to 'master'.
@daliwang, it's your decision as to how you proceed:
Let me know if you need any assistance.
@jnjohnsonlbl and @bbye, Sorry for the late response, I was occupied with other projects and an international trip. I would like the "rebase" master approach. In order to make sure the code is consistent with all the current ALMV1 development, please test your code use this compset (I1850CLM45CN) and those resolutions (f09_f09, f19_f19, hcru_hcru) on your local machine (Edison) and one of OLCF or ALCF computers (Titan or Mira)
Is there an actual test case, or am I just checking that the model will run over those compset / resolution combos? Incidentally, the I1850CLM45BGC at hcru_hcru is missing the data file: clmi.I1850CRUCLM45BGC.0241-01-01.360x720cru_hcru_simyr1850_c140111.nc
@bbye. To my understand that I1850CLM45CN not BGC will be the default compset for ALMV1, so please test your code on it. If there are some data missing, you can contact @dmriccuito (OIC) or Xiaoying Shi (Titan). You can also contact Peter if you have some question related to ALMV1 compset. Best
For development I think there was an earlier discussion that land model testing should be done using cold starts since we don't want to generate new initial files every time there is a PR that changes the restart file variables (there will be several I think). @bbye, do you need initial files in order to test the crop model? I think you mentioned this earlier - in that case we'll need to maintain an initial file, at least for one configuration/compset.
@daliwang If I only need to test the CN version I have all the data to do that. Is there a specific test you want me to do (e.g. short run [days, months, years]?, restart, etc.). Is there an easier way to run on mira without using all those processors?
@dmricciuto , it would be useful (but not critical) to have an initial file at least during version 1 development (I actually modified it already by adding a couple new variables). So far I've been able to get by with older initialized files and I can probably continue that way. For version 2, it won't be needed since I will be modifying the number of pfts which will require a new spinup.
That test should be added to all platforms that have acme_developer.
I don't have access to titan, only mira and Edison.
@rljacob Should I simply add the I1850CLM45CN
test for all machines listed on the Configuration+Management page?
@bishtgautam, yes, please do. We very much want to get away from having different tests on different machines on our core test suites.
@jgfouca In PR #225, I added all machines except Hopper and OIC.
The Configuration Management page is not the best authority. Add it to all the machines that have acme_developer according to testlist.xml.
@bbye :
lnd/clm2/paramdata/clm_params.crop.c150330.nc
file be added to https://acme-svn2.ornl.gov/acme-repo/acme/inputdata/lnd/clm2/paramdata? create_test
for acme_developer test suite with -generate
option instead of -compare
?Yes, I'm actually in the process of adding the parameter file to the acme svn repo, but I'm going to update the PR first to reflect using this parameter file as the default (not just when crops are active).
The reason for the passed tests is your first answer, none of those compsets have crops active, although to be honest, I don't remember if I used the -compare option when I ran the test suite (I did not use -generate). I'll rerun them to be sure.
The README.case file for any of the tests will tell you what were the argument for create_test
that you used.
I loaded the parameter file (lnd/clm2/paramdata/clm_params.c150330.nc) to the acme svn repository; note the change in name to align with the file name conventions. I also updated the namelist to use this as the default file, but I can't get the model to load the file to Edison or mira (no permissions). It works fine if I list it in the user_nl_clm file. Should I update the commit using -amend or issue a new commit?
I reran the acme developer tests with the -compare option. There are several BFAIL from missing baselines. The ERS.f45_g37.B1850C5 and the ERS_D.f45_g37.B1850C5 failed because the atm_in files differed (that wasn't from me - I assume the baseline was updated after I checked out my code). Also, similar to my previous run, the IOP4c and IOP4p failed during the history comparisons, but when I manually checked the files they were the same. Finally, the PEA_P1_M.f45_g37_rx1.A still doesn't run on Edison.
I also ran the I1850CLM45CN for f19_f19 and f09_f09 resolutions on both Edison and mira for 5 days and they run fine, which is not surprising since crops aren't active in those compsets.
Hi @bbye,
clm_params.crop.c150330.nc
to clm_params.c150330.nc
:
models/lnd/clm/bld/namelist_files/namelist_defaults_clm4_5.xml
The following gist summarizes the list of commands to do the above steps https://gist.github.com/bishtgautam/9a91ee96563516e0f9b6
In regards to the permission issue for lnd/clm2/paramdata/clm_params.c150330.nc
on Edison and Mira.
clm_params.crop.c150330.nc
will be contributed to NCAR's svn, I believe it would be straightforward to update the CESM's inputdata directories on Edison and Mira. But, if clm_params.crop.c150330.nc
will not be contributed to NCAR then I would suggest the following two directory where clm_params.crop.c150330.nc
should be copied:/project/projectdirs/acme/data/inputdata/lnd/clm2/paramdata
(Btw, @mt5555, @drhansj, others, may have an opinion on what is the best location to save ACME generated inputdata that can't be saved in CCSM inputdata directory).
BFAIL
. I believe the #225 is going to be merged into 'master' today. After that I will send you commands for you to rebase your branch against master. Then, you will be able compare all tests within acme_developer test suite against baselines I previously generated.Following are the set of commands to rebase bbye/clm/crop-yield
onto master
and compare results for acme_developer test suite against baselines on Edison available at /project/projectdirs/acme/gbisht/baselines/679d6b8-acme_developers
This is really helpful - Thanks! Should I push the commit before I perform the rebase?
Order of operations doesn't matter. At the end, you should have a single commit in your branch and your branch would be rebased from a recent commit in master.
The following tests did not pass: FAIL ERS.f19_g16_rx1.A.edison_intel.tputcomp.679d6b8-acme_developer COMMENT tput_decr = 109.116 tput_percent_decr = 60.8 FAIL ERS.ne30_g16_rx1.A.edison_intel.tputcomp.679d6b8-acme_developer COMMENT tput_decr = 87.949 tput_percent_decr = 73.4 FAIL ERS_IOP4c.f19_g16_rx1.A.edison_intel.tputcomp.679d6b8-acme_developer COMMENT tput_decr = 10.268 tput_percent_decr = 7.56 FAIL ERS_IOP4c.ne30_g16_rx1.A.edison_intel.tputcomp.679d6b8-acme_developer COMMENT tput_decr = 4.28 tput_percent_decr = 3.59 RUN PEA_P1_M.f45_g37_rx1.A.edison_intel.C.acme_dev_ba1546f
The last one is the serial issue, but the other four are from a change in throughput - I can't explain this because crops aren't active in any of those compsets.
@bbye : Usually the * tputcomp* and * memcomp* failures are ignored.
@daliwang : This PR is ready to be merged in next.
@bbye This branch is in next now.
@bbye, the nightly test fails with skybridge with testcase SMS.f19_f19.I1850CLM45CN.skybridge_intel/gnu.
http://my.cdash.org/viewTest.php?buildid=778559
I will try to get more information on this.
@jgfouca . I found that the SMS.f19_f19.I1850CLM45CN.skybridge_intel/gnu failed. http://my.cdash.org/viewTest.php?buildid=778559. Where / how can I get more detailed error message?
If you click on the "Failed", it will take you to the TestStatus.out. Is this a new test?
The test failed because the baseline was missing on skybridge. http://my.cdash.org/testDetails.php?test=19705953&build=778559
@jgfouca : The SMS.f19_f19.I1850CLM45CN test was recently added to next and turned on for all ACME machines (1e6bc09).
That was merged a couple weeks ago though, right? I'm wondering why baselines are missing.
On skybridge: 2015-06-09: SMS.f19_f19.I1850CLM45CN passed 2015-06-10: SMS.f19_f19.I1850CLM45CN failed
For comparison, the 2015-06-09 test used baselines in the scratch_directory, while 2015-06-10 test attempted to use baselines in the archive directory
2015-06-09: /gscratch/hudson/acme_scratch/skybridge/SMS.f19_f19.I1850CLM45CN.skybridge_intel.C.jenkins_testid_20150609_050148/
2015-06-10: /gscratch/hudson/acme_scratch/skybridge/archive/SMS.f19_f19.I1850CLM45CN.skybridge_intel.C.jenkins_testid_20150610_050253
Good catch, I wonder why that changed?
If that's an expected change, I can move the baselines to the new location where they are now expected to be.
Let me confirm that it was this commit that caused the problem.
So this is the most recent merge to next, I checked out the previous merge and ran the test and it passed, so it's definitely this commit that broke things.
Can you paste the output of README.case for the SMS.f19_f19.I1850CLM45CN test done on 2015-06-09 and 2015-06-10 for either Skybridge or Melvin?
create_test -input_list /tmp/tmpro5KSn -mach melvin -testroot /home/jenkins/acme/scratch/jenkins -baselineroot /home/jgfouca/acme/baselines -compare next -project ignore -testid jenkins_testid_20150610_073103 -autosubmit off -nobatch on
test created with the following options: basecmp_case: next/SMS.f19_f19.I1850CLM45CN.melvin_gnu baselineroot: /home/jgfouca/acme/baselines case: SMS.f19_f19.I1850CLM45CN.melvin_gnu.C.jenkins_testid_20150610_073103 casebaseid: SMS.f19_f19.I1850CLM45CN.melvin_gnu compiler: gnu compset: I1850CLM45CN fullname: SMS gri\ d: f19_f19 mach: melvin test_argv: -testname SMS.f19_f19.I1850CLM45CN.melvin_gnu -testroot /home/jenkins/acme/scratch/jenkins -compare next testname: SMS
create_newcase -case /home/jenkins/acme/scratch/jenkins/SMS.f19_f19.I1850CLM45CN.melvin_gnu.C.jenkins_testid_20150610_073103 -res f19_f19 -mach melvin -compset I1850CLM45CN -testname SMS -nosavetiming -mach_dir /home/jenkins/slave/workspace/ACME_Basic_next/ACME_Climate/scri\ pts/ccsm_utils/Machines -compiler gnu -sharedlibroot /home/jenkins/acme/scratch/sharedlibroot.jenkins_testid_20150610_073103 -project ignore
The nightly tests are being run by scripts/acme/create_test or scripts/create_test?
scripts/acme/create_test
These code modifications include yield calculation in bu/acre and dry matter (t/ha) and allow harvest of grain to a product pool analogous to the wood product pools, but with a 1-yr turnover. An option to harvest percent of residue is allowed but currently residue harvest is set to 0.
Three new parameters are in the clm_params file for the yield and residue harvest. The new parameter file (clm_params.crop.c150330.nc) will be used only when the crop model is active
[CC]
LG-71