Open har917 opened 2 months ago
@har917 not sure, but I think the issue is that config file name should be config.yaml
instead of benchcab.yaml
(also the namelist files have been updated in bench_example
and cable
so no patch is needed - this is why the namelist file contents could be correct).
Perhaps to add more detail.
This testing is based off is a fresh git clone (as of today) - the cable.nml that is downloaded into the benchcab_example/namelists directory has the old check%ranges = .false.
line
In the above (and this wasn't clear - apologies) the file that I refer to as benchcab.yaml is the config.yaml file that the user edits in the benchcab_example root directory (I named it that because there are other config.yaml files created elsewhere in the structure)
Is there supposed to be a config.yaml file created in the benchcab_exmaple/runs/fluxsite/ directory (equivalent to the .yaml files created in the spatial/crujraaccess* directories)?
I see, regarding bench_example
we still have to merge the (approved) PR https://github.com/CABLE-LSM/bench_example/pull/23 (~so it will be done soon~ edit: we still need to see how to manage namelist compability)
Now, the following set of commands seem to work for me
$ git clone git@github.com:CABLE-LSM/bench_example.git
$ cd bench_example
$ vim config.yaml
# Following lines go in this file
realisations:
- repo:
git:
branch: main
patch:
cable:
check:
ranges: 0
- repo:
git:
branch: 335-facilitate-output-of-potential-evaporation-directly-from-the-offline-code-base
patch:
cable:
check:
ranges: 0
modules: [
intel-compiler/2021.1.1,
netcdf/4.7.4,
openmpi/4.1.0
]
$ benchcab run -v
By any chance, was benchcab
run from another directory?
Also, there shouldn't be any config.yaml
directory in benchcab_example/runs/fluxsite
By any chance, was benchcab run from another directory?
I don't think so though - there is a possibility that I ran it from one layer too high but I thought it would completely fail if I did that (I have a /benchcab directory on scratch into which git clone creates the /benchcab_example directory and think I ran from /benchcab_example). All this was run via a VS code terminal.
I didn't use the -v option - is that important?
I didn't use the -v option - is that important?
Not really, it is for verbose output (just to check whether there were any warnings/issues before submitting job)
Maybe it detected a config.yaml
on top/environment path (little chance but just in case). It seems to work well for me, but maybe somebody else (@SeanBryan51 @ccarouge) can recreate this issue. Meanwhile @har917 maybe run the above set of commands from /scratch
and if you could recheck that'd be great.
I've just completed a completely fresh run using the commands above - with the only thing different being that I used the VS code editor not vim (since I'm not a vim user).
It's failed in the same way - it's
Likely contradicting my earlier thinking - I'm not sure it's done anything in the payu section in that there's notthing in the work directory (only in the archive directory).
One thing I've just thought of - is there a project dependence somewhere in here? I've been running these tests from p66 - should I try from a different project (e.g. x45, rp23).
@AlisonBennett Could you have a go at following the instructions (4^) from @abhaasgoyal above to see whether you can get this to run?
Just trying to figure out whether this is at my end or somewhere else.
ps. you'll get to see how quickly the updated compilation/build is - only takes a couple of minutes in contrast to 15+ with BLAZE_9814
@har917 yes - I have done this and it seems to have run (ie. it built some stuff and then submitted a pbs job which took a while to run through and now there is a bunch of extra stuff in some new directories). I'm not really sure what output to expect though, so perhaps it's best for you to have a look at scratch/x45/ab7412/benchcab_test
to see if that is what it is mean to do.
There were a few errors before I got this far. To overcome these, I had to: a) copy @abhaasgoyal's code for the .yaml file into my .yaml file (before I did that I got lots of errors very similar to your initial post). I think the yaml syntax is very fussy. You could try taking a copy of my .yaml file to see if that solves your problem? b) follow instructions to load benchcab modules here (before I did that my environment didn't know about benchcab) c) start a new arc session with adding access to both projects gdata/hh5 and gdata/ks32 (before I did that it said it didn't have access to the meteorology for one of the flux sites).
Hope this helps.
@har917 mind sharing the path where you are running from?
Actually @har917 what's the -l storage
line in the qsub job and where do you run? Are you running from /g/data/p66 and it isn't in the -l storage
line for example?
@ccarouge I've been running from /scratch/x45 but likely submitted the job under p66 (as that's my default project).
@AlisonBennett has successfully run the regression test (under x45) this morning.
I'm trying again (but ensuring that I'm under x45) - and this is certainly behaving differently (in that it's produced fluxsite outputs) however it hasn't produced a benchmark_cable_qsub.sh.o*** file even though the job has apparently finished (via qstat)
the -l storage
line is both sets of runs is #PBS -l storage=gdata/ks32+gdata/hh5+gdata/wd9
Basically I think the problem is that I've been essentially asking a job under p66 to write to scratch under x45 and it's said no (understandable) - but the error message is a bit odd.
On further thought - what's likely happened is that benchcab tries to copy the config.yaml file from its root directory to somewhere else as part of the workflow (that fails because of the gadi permissions requirement), then benchcab tries to read the copy of the config.yaml file (which doesn't exist) and you get the error above.
Perhaps a note in the benchcab 'how to' about matching project with the PBS storage and/or matching project with calling point is needed
EDIT: it's now produced a .o*** file so all good.
@har917 When you run the job using p66, Gadi will automatically mount /scratch/p66 but not /scratch/x45. If you run using x45 resources, Gadi will mount /scratch/x45 (and not /scratch/p66).
In config.yaml, it's possible to give additional projects to mount: https://benchcab.readthedocs.io/en/latest/user_guide/config_options/#+pbs.storage
You may want to add scratch/x45
so it works no matter what resources are used
Edit: I'm assuming you run switchproj
before running benchcab since we haven't provided a way to run benchcab under a different project as the current project of the user.
@ccarouge @SeanBryan51 @abhaasgoyal As of 17/7/2024 - I'm having difficulty getting benchcab to run (anything).
First issue - following recent updates to
check%ranges
the current default namelist (so what is supposed to be used for regression testing) still hascheck%ranges = .false.
when created via git clone (and so the runs fail).Using
as the benchcab.yaml file appears to successfully create cable.nml files with the correct entries.
However benchcab then throws (in the qsub.sh.o*** file)
Interestingly the spatial runs appear to have completed successfully. I don't see a .yaml file in the runs/fluxsite directory which is consistent with the error message.
Any thoughts?