NordicESMhub / ctsm-docs

Local documentation for Community Terrestrial Systems Model
https://ctsm-docs.readthedocs.io/en/latest/
Creative Commons Attribution 4.0 International
3 stars 3 forks source link

Error when submitting case on Abel #2

Open ecaas opened 5 years ago

ecaas commented 5 years ago

When trying to run ./case.submit with modified .xml files (config_batch.xml, config_mashines.xml , config_compilers.xml) we get the following error message:

ERROR: Command: 'sbatch .case.run --resubmit' failed with error '' from dir '/cluster/home/ecaas/ctsm_cases/fates_f19_g17'

Any suggestions on what may be the problem, or what we should do?

annefou commented 5 years ago

Did you restart from scratch? i.e. remove ctsm directory

hrn800 commented 5 years ago

I am trying the "out of the box run on abel" but have the same issue and get the same error message as described above. How was the issue resolved previously? Any suggestions?

annefou commented 5 years ago

Did you re-clone or updated your local copy of both ctsm and cime repositories? We have updated config_batch.xml

<batch_system MACH="abel" type="slurm">
   <batch_submit>sbatch</batch_submit>
   <submit_args>
     <arg flag="--time" name="$JOB_WALLCLOCK_TIME"/>
     <arg flag="-p" name="$JOB_QUEUE"/>
     <arg flag="--account" name="$PROJECT"/>
   </submit_args>
   <directives>
      <directive>--mem-per-cpu=3936M</directive>
      <directive>--cpus-per-task=1</directive>
   </directives>
   <queues>
      <queue walltimemax="00:30:00" nodemin="1" nodemax="72" >normal</queue>
      <queue walltimemax="00:30:00" nodemin="1" nodemax="72" default="true">normal</queue>
   </queues>
 </batch_system>
hrn800 commented 5 years ago

I have cloned ctsm following the instructions in "README.md": git clone -b release-clm5.0 https://github.com/NordicESMhub/ctsm.git

Should I also clone the cime repository in a similar way?

My config_batch.xml is updated as recommended

annefou commented 5 years ago

Which case do you create?

hrn800 commented 5 years ago

I2000Clm50BgcCruGs This the global example without fates

annefou commented 5 years ago

It is hard to investigate without having a closer look because I don't have this error and it works perfectly when I follow what is written in the README. If you still have sbatch .case.run --resubmit

It means that it does not take what we have updated because you should have sbatch --time 0:20:00 -p normal --account geofag --dependency=afterok:25986496 case.st_archive --resubmit

hrn800 commented 5 years ago

OK thanks, you are right, I still have sbatch .case.run --resubmit. Could you please advice me on what to do next? Restart from scratch again? or can I update?