Closed owainkenwayucl closed 5 years ago
OK the license manager is now installed on the UCL wide license server (lm-server3.ucl.ac.uk). My fist test from a linux VM attach via Eduroam was successful. I now need to test serving license to Legion. Which means I'm doing this install on there.
I've done a test installation in my home to work out what the build script needs to do. It looks like it has installed correctly so I'm now producing a build script.
Build script produced and pulled to Legion. Installation as follows. Using ccspapp:
cd /shared/ucl/apps/build_scripts
./cmg-2017.101_install
installs into /shared/ucl/apps/CMG/2017.101.GU
Module file produced and pulled to Legion. Now running some simple tests.
Ran the following test using supplied example data:
cd Scratch/CMG_examples
module load cmg/2017.101
cp $CMG_HOME/imex/2017.10/tpl/spe/*.dat .
RunSim.sh imex 2017.10 mxspe001.dat -doms -parasol 16
This runs successfully on a compute node via a qrsh session using:
-pe smp 16
The log on the license server shows:
0/10 12:23 (cmgl) OUT: solve_university v2017.10 by ccaabaa@node-u07a-020 MX201710 12C 10 (20 licenses)
10/10 12:23 (cmgl) OUT: solve_parallel v2017.10 by ccaabaa@node-u07a-020 MX201710 12C 10 (60 licenses)
10/10 12:23 (cmgl) IN: solve_university v2017.10 by ccaabaa@node-u07a-020 MX201710 12C 10 (20 licenses)
10/10 12:23 (cmgl) IN: solve_parallel v2017.10 by ccaabaa@node-u07a-020 MX201710 12C 10 (60 licenses)
Output from the above example:
LD_LIBRARY_PATH="/shared/ucl/apps/CMG/2017.101.GU/imex/2017.10/linux_x64/lib/:/shared/ucl/apps/CMG/2017.101.GU/stars/2017.10/linux_x64/lib:/shared/ucl/apps/CMG/2017.101.GU/gem/2017.10/linux_x64/lib:/shared/ucl/apps/CMG/2017.101.GU/imex/2017.10/linux_x64/lib:/shared/ucl/apps/intel/2018.Update3/impi/2018.3.222/intel64/lib:/shared/ucl/apps/intel/2018.Update3/debugger_2018/libipt/intel64/lib:/shared/ucl/apps/intel/2018.Update3/compilers_and_libraries_2018.3.222/linux/daal/../compiler/lib/intel64_lin:/shared/ucl/apps/intel/2018.Update3/compilers_and_libraries_2018.3.222/linux/daal/../tbb/lib/intel64_lin/gcc4.4:/shared/ucl/apps/intel/2018.Update3/compilers_and_libraries_2018.3.222/linux/daal/lib/intel64_lin:/shared/ucl/apps/intel/2018.Update3/compilers_and_libraries_2018.3.222/linux/tbb/lib/intel64/gcc4.4:/shared/ucl/apps/intel/2018.Update3/compilers_and_libraries_2018.3.222/linux/mkl/lib/intel64:/shared/ucl/apps/intel/2018.Update3/compilers_and_libraries_2018.3.222/linux/ipp/lib/intel64:/shared/ucl/apps/intel/2018.Update3/compilers_and_libraries_2018.3.222/linux/ipp/../compiler/lib/intel64:/shared/ucl/apps/intel/2018.Update3/compilers_and_libraries_2018.3.222/linux/mpi/intel64/lib:/shared/ucl/apps/intel/2018.Update3/compilers_and_libraries_2018.3.222/linux/compiler/lib/intel64:/shared/ucl/apps/giflib/5.1.1/gnu-4.9.2/lib:/shared/ucl/apps/subversion/1.8.13/lib:/shared/ucl/apps/apr-util/1.5.4/lib:/shared/ucl/apps/apr/1.5.2/lib:/shared/ucl/apps/git/2.10.2/gnu-4.9.2/lib64:/shared/ucl/apps/flex/2.5.39/gnu-4.9.2/lib:/shared/ucl/apps/gcc/4.9.2/lib:/shared/ucl/apps/gcc/4.9.2/lib64"
/shared/ucl/apps/CMG/2017.101.GU/imex/2017.10/linux_x64/exe/mx201710.exe -f mxspe001.dat -doms -parasol 16
********************************************************************************
* *
* IMEX 2017.10 *
* Adaptive - Implicit Black Oil Simulator *
* General Release for Linux x64 *
* 2017-Oct-17 12:06:12 *
* *
* (c) Copyright 1977 - 2017 *
* Computer Modelling Group Ltd., Calgary, Canada *
* All Rights Reserved *
* *
********************************************************************************
Command-line Arguments: -f mxspe001.dat
-doms
-parasol 16
Date and Time of Start of Run: Oct 10, 2018 12:32:10
*** Input/Output files specification :
Opened data file on unit 10, filename is 'mxspe001.dat'
Opened Scratch file on unit 11
===================== WARNING (from subroutine: INDYNM) ======================
Parasol specified using -parasol on command line
aimsol was in effect - overriding using -parasol.
==============================================================================
===================== WARNING (from subroutine: INDYNM) ======================
INAC-WELL-SO not specifed with PARASOL in effect.
Defaulting to INAC-WELL-SO ON and continuing.
==============================================================================
Opened output file on unit 12, filename is 'mxspe001.out'
Opened SR3-OUT on unit 13, filename is 'mxspe001.sr3'
Opened INDEX-OUT on unit 14, filename is 'mxspe001.irf'
Opened MAIN-RESULTS-OUT on unit 15, filename is 'mxspe001.mrf'
Opened Restart-SR3 on unit 16, filename is 'mxspe001.rstr.sr3'
Opened Restart-NDX on unit 17, filename is 'mxspe001.rstr.irf'
Opened Restart-BIN on unit 18, filename is 'mxspe001.rstr.mrf'
Opened GRID scratchfile on unit 19
===================== WARNING (from subroutine: INCOMP) ======================
P/Z Output is disabled as Temperatures were not defined for all PVT Regions
Please review the *TRES Keyword
Line: 113 File Name: mxspe001.dat
==============================================================================
===================== WARNING (from subroutine: PVTCHK) ======================
PVT Table 1 Entry (i=) 9
fails compressibility check Bg(i+1)*dRs/Dp(i) must be > dBo/Dp(i)
==============================================================================
===================== WARNING (from subroutine: PVTCHK) ======================
IMEX has detected negative total compressibility
Please Correct Above Entries to avoid poor performance
==============================================================================
===================== WARNING (from subroutine: PSSLDS) ======================
Chosen direction has too few slabs containing active blocks (10) to support *AUTOPSLAB 16 ;
Treating the partition as *AUTOPSLAB 5 .
==============================================================================
===================== SUMMARY (from subroutine: INDATA) ======================
Reading of initial data is complete.
Simulation will stop if there were error messages.
4 Warning messages. 0 Error messages.
==============================================================================
===================== WARNING (from subroutine: WMCHCK) ======================
WELL Injector is an unweighted injector with XFLOW-MODEL FULLY-MIXED.
The current formulation does not allow this. Resetting to XFLOW-MODEL ZERO-FLOW.
==============================================================================
I M E X T I M E S T E P S U M M A R Y
First SPE Comparative Solution Project
(See Odeh, A.S., J.P.T., 33, pp.13-25, 1981.)
==================================================================================================================
Time Step Time Total Production Total Injection Total Max. Max. Change
---------------- ---------------- -------------------------------------- ----------------- PV Mat. ------------
C Oil Gas Water GOR. Wat. Gas Water Avg. bal. Satur. Pres.
Size U SCF/ Cut Pres. err. DSMAX DPMAX
No. Days IT T Days yy:mm:dd STB/D MCF/D STB/D STB % MCF/D STB/D psia % psia
------ ---- -- - ----- ---------- -------- -------- -------- ----- ----- -------- -------- ----- ---- ------ -----
1w 1.00 1 0 1.000 1986:04:23 20000.00 25400.00 1270 100000.0 4795 e-4g 2e-4o 505
2 1.98 1 0 2.979 1986:04:25 20000.00 25400.00 1270 100000.0 4798 e-3g 1e-4o 957
3 2.07 1 0 5.047 1986:04:27 20000.00 25400.00 1270 100000.0 4803 e-3g .087g 281
4 4.77 1 0 9.82 1986:05:02 20000.00 25400.00 1270 100000.0 4813 e-3g .115g 376
5 8.31 1 0 18.12 1986:05:10 20000.00 24894.84 1245 100000.0 4824 .04g -.044o 1154
6 7.20 1 0 25.32 1986:05:17 20000.00 24665.08 1233 100000.0 4842 .05g .102g 748
7 9.6 1 0 34.94 1986:05:27 20000.00 24403.16 1220 100000.0 4866 .05g .093g 600
8 16.0 2 0 50.96 1986:06:12 20000.00 24194.71 1210 100000.0 4907 .05g .120g 217
9 26.6 2 0 77.60 1986:07:09 20000.00 24318.05 1216 100000.0 4963 .05g .126g 1149
10 23.2 2 0 100.8 1986:08:01 20000.00 24421.49 1221 100000.0 5019 .05g .142g 884
11 26.2 2 0 127.0 1986:08:27 20000.00 24576.09 1229 100000.0 5081 .05g .132g 1021
12 25.7 2 0 152.7 1986:09:22 20000.00 24872.83 1244 100000.0 5138 .06g .113g 1028
13 25.0 2 0 177.7 1986:10:17 20000.00 25191.66 1260 100000.0 5195 .06g .119g 579
14 42.1 2 0 219.8 1986:11:28 20000.00 25471.18 1274 100000.0 5278 .09g .114g 1653
15 25.5 1 0 245.2 1986:12:23 20000.00 25429.72 1271 100000.0 5336 .09g .123g 657
16 38.7 2 0 284.0 1987:01:31 20000.00 25420.66 1271 100000.0 5413 .10g .132g 1258
17 30.8 1 0 314.7 1987:03:03 20000.00 25469.43 1273 100000.0 5480 .16g .116g 952
18 32.3 2 0 347.1 1987:04:04 20000.00 25550.60 1278 100000.0 5548 .16g .092g 880
19 36.7 2 0 383.8 1987:05:11 20000.00 25604.53 1280 100000.0 5623 .16g .101g 474
20 72.9 3 0 456.7 1987:07:23 20000.00 25616.16 1281 100000.0 5756 .17g .127g 1080
I M E X T I M E S T E P S U M M A R Y
First SPE Comparative Solution Project
(See Odeh, A.S., J.P.T., 33, pp.13-25, 1981.)
==================================================================================================================
Time Step Time Total Production Total Injection Total Max. Max. Change
---------------- ---------------- -------------------------------------- ----------------- PV Mat. ------------
C Oil Gas Water GOR. Wat. Gas Water Avg. bal. Satur. Pres.
Size U SCF/ Cut Pres. err. DSMAX DPMAX
No. Days IT T Days yy:mm:dd STB/D MCF/D STB/D STB % MCF/D STB/D psia % psia
------ ---- -- - ----- ---------- -------- -------- -------- ----- ----- -------- -------- ----- ---- ------ -----
21 67.5 2 0 524.2 1987:09:28 20000.00 25544.49 1277 100000.0 5880 .17g .126g 1500
22 45.0 1 0 569.3 1987:11:12 20000.00 25489.50 1274 100000.0 5963 .19g .136g 1168
23 38.5 1 0 607.8 1987:12:21 20000.00 25458.63 1273 100000.0 6032 .20g .119g 736
24 52.4 2 0 660.2 1988:02:11 20000.00 25434.24 1272 100000.0 6119 .19g .087g 661
25 79.2 2 0 739.4 1988:04:30 20000.00 25416.45 1271 100000.0 6243 .19g .108g 1156
26 68.5 3 0 807.9 1988:07:08 20000.00 25408.40 1270 100000.0 6347 .18g .108g 1422
27 48.2 3 0 856.1 1988:08:25 20000.00 25405.02 1270 100000.0 6418 .18g .109g 627
28 76.8 4 0 932.9 1988:11:10 20000.00 25402.42 1270 100000.0 6527 .18g .119g 1269
29 60.5 2 0 993.4 1989:01:09 20000.00 25401.37 1270 100000.0 6614 .18g .081g 576
30 105 6 0 1098 1989:04:24 20000.00 26718.62 1336 100000.0 6750 .16g .147g 1422
31 73.8 5 0 1172 1989:07:07 20000.00 28982.48 1449 100000.0 6838 .16g .097g 939
32 39.3 3 1 1212 1989:08:16 20000.00 31590.09 1580 100000.0 6880 .15g .072g 461
33 78.6 4 0 1290 1989:11:02 20000.00 52177.10 2609 100000.0 6909 .14o -.100o 680
34 116 3 0 1406 1990:02:26 20000.00 108404.5 5420 100000.0 6715 .14o -.077o 990
35 117 2 0 1522 1990:06:22 20000.00 141366.1 7068 100000.0 6369 .15o -.107o 770
36 152 5 0 1674 1990:11:21 20000.00 177732.3 8887 100000.0 5757 .15o -.108o -742
37 204 5 0 1878 1991:06:13 13946.74 132876.3 9527 100000.0 5370 .15o -.136o -469
38 300 6 0 2178 1992:04:08 11462.75 129304.9 11280 100000.0 4924 .16o -.142o -495
39 424 5 0 2602 1993:06:06 9623.877 123096.3 12791 100000.0 4509 .16o -.193o -457
40 438 5 0 3041 1994:08:19 8298.566 119126.4 14355 100000.0 4225 .15o -.146o -309
I M E X T I M E S T E P S U M M A R Y
First SPE Comparative Solution Project
(See Odeh, A.S., J.P.T., 33, pp.13-25, 1981.)
==================================================================================================================
Time Step Time Total Production Total Injection Total Max. Max. Change
---------------- ---------------- -------------------------------------- ----------------- PV Mat. ------------
C Oil Gas Water GOR. Wat. Gas Water Avg. bal. Satur. Pres.
Size U SCF/ Cut Pres. err. DSMAX DPMAX
No. Days IT T Days yy:mm:dd STB/D MCF/D STB/D STB % MCF/D STB/D psia % psia
------ ---- -- - ----- ---------- -------- -------- -------- ----- ----- -------- -------- ----- ---- ------ -----
41 609 6 0 3650 1996:04:19 6810.896 119149.3 17494 100000.0 3960 .14o -.174o -284
===================== SUMMARY (from subroutine: TSIO) ======================
Simulation run terminated. Stopping time reached.
5 Warning messages. 0 Error messages.
==============================================================================
Field Total Fluid
Oil Gas Water Solvent Polymer
------- ------- ------- ------- -------
(MSTB) (MMSCF) (MSTB) (MMSCF) (MLB)
Cumulative Production 51635 334363 0 NA NA
Cumulative Injection NA 365000 0 NA NA
Cumulative Gas Lift NA 0 NA NA NA
Cumulative Water Influx NA NA 0 NA NA
Current Fluids In Place 233109 392060 63382 NA NA
Production Rates 6.8109 119.15 0 NA NA
Injection Rates NA 100.00 0 NA NA
Timesteps: 41 Newton Cycles: 110 Cuts: 1 Solver Iterations: 1302
Average Implicitness : 0.305
Material Balances (owg ): 0.999 1.000 0.999
Average Active Blocks: 300 Average Non-BHP Active Wells: 2
Total Blocks : 300 Total Wells : 2
Active Blocks: 300 Non-BHP Active Wells: 1
Total lstpro/lstpar calls: 2
Time at end of simulation: 3650.00 (days)
Average reservoir pressure excluding water zone: 3960.361 (psi)
Total Number of Solver Failures: 0 Stalls: 0 ITERMAX Reached: 0
Jacobian Domains 3
Linear Solver: Parasol
Number of level 1 solver classes: 5
AUTOPSLAB Cutting Direction: J
Preconditioner Ordering REDBLACK
Preconditioner Degree aa, ab, as, bb, ba: 1 2 1 1 1
KMP_AFFINITY: Default
OMP_SCHEDULE: Default
Max Impl Blocks: 205 %Impl: 68.3% (TS,CUT,NCYC): ( 41, 0, 3 )
Max Solver Iterations (TS,CUT,NCYC): 26 ( 40, 0, 1 )
Number of threads set: 16
Total number of cpus: 16
Number of threads for grid module: 16
Memory Usage Peak: 39 MB on TS: 26 TS 1 Peak: 36 MB Average: 38 MB VM Size: 1251 MB
Memory Usage Final Size: 40 MB
Host computer: node-u07a-020
End of Simulation: Normal Termination
CPU Time: 16.18 seconds
Elapsed Time: 1.19 seconds
Date and Time of End of Run: Oct 10, 2018 12:32:11
Done RunSim.
Module file has been updated to use the DNS alias lic-cmg.ucl.ac.uk for accessing the license manager. I've rerun the test simulations to make sure everything is still working.
Putting together an example job script. Have a test job in the queue at the moment,
IN:03115615 requesting remote job submission from Windows Launcher.
To CMG on Legion, I need to establish a connection from my PC to Legion and CMG will submit the job itself. I can submit single jobs, but I need to run optimisation job which requires many jobs and then communicating with optimiser which works based on Launcher.
PDF about it attached to ticket.
The CMG supplied documentation for remote job submission "Accessing an Oracle GRID Engine Cluster" includes the following requirement:
I'm investigating if this is possible with our environment. We also need to ask CMG if they support remote submission where the job directory is not shared between the local Windows and remote Linux systems (ie like MathWorks does for Matlab).
We are going to experiment with using WinSshFS (https://github.com/feo-cz/win-sshfs) to export user Scratch to Windows 10. We will test using a Windows 10 VM on a Mac to have a similar environment to the users.
Note: must install CMG on Myriad before testing the Windows integration.
I've installed WinSSHFS on my Windows 10 VM together with the recommended version of Dorcan. I've not been able to make it work yet.
The error is was getting yesterday was:
using configuration:
Same error when using a real Windows 10 PC.
Made some progress this afternoon. Using different set of software at the Windows end:
1) get this
https://github.com/mhogomchungu/sirikali/releases/tag/1.3.6
sirkali-setup.exe
2) get this
https://github.com/billziss-gh/sshfs-win/releases/tag/v2.7.17334
i got the 64bit msi
3) get this
https://github.com/billziss-gh/winfsp/releases/tag/v1.4
just the msi.
Installed in that order. Went to explorer, map network drive, used
\\sshfs\yourRemoteLogin@remoteComputer
as per
https://codeyarns.com/2018/05/03/how-to-mount-remote-directory-on-windows-using-sshfs-win/
We can now mount user's Myriad home directories on the windows box but still cannot access Scratch. Thanks @uclWerner for sorting it out.
I've repeated the test on my Windows 10 VM running on my iMac and I can also mount my Myriad home directory on it.
We still need to sort out how to access Scratch as the symbolic link ~/Scratch dosen't work at the Windows end. We will need to have more discussions about this issue next week.
We are going to try the following to get access to Lustre Scratch:
I'm getting an error when trying to login to the ucspcmg account on Myriad:
Authentication service cannot retrieve authentication info
It works with the genpw password on Aristotle.
ucspcmg now working for normal login - needed to be added to the shadow files.
Progress - mounting works and files can be read and written from the Windows 10 VM and can be seen on Myriad login nodes.
Next thing:
A basic test job has been submitted:
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
566647 0.00000 CMG-SMP-jo ucspcmg qw 02/04/2019 15:18:34 12
I forgot to change the working directory to:
/scratch/scratch/ucspcmg/CMG_examples
Updated and resubmitted.
This time the job started but appears to have no modules loaded. Adding loading default modules .and resubmitting.
Test CMG job worked. Now following: CMG run.pdf
to try and set up remote job submission from my VM.
The CMG provided instructions (above) tell you to install the cmgsimrun command on the cluster. This is not available with the Linux install archive. In fact cmgsimrun is included within the Windows installer!
So we need to copy the following folder after installing the Windows version:
C:\Program Files (x86)\CMG\Launcher\2017.10\Linux_x64
to Myriad and Legion in:
/shared/ucl/apps/CMG/2017.101.GU/Launcher/2017.10/
I've added a prompt to do the copy from Windows to the build script.
_C:\Program Files (x86)\CMG\Launcher\2017.10\Linuxx64 installed on both Legion and Myriad.
Module file updated to add:
/shared/ucl/apps/CMG/2017.101.GU/Launcher/2017.10/Linux_x64/EXE
to the PATH. Test job submitted and run successfully to check that the module file still works.
I think the configuration on the Myriad end is done. Now to try setting up on Windows 10.
One more thing on the Myriad side! ~ucspcmg/CMG/CMG_setup.sh created to load module file. This is sources during ssh connection. On the Windows side:
ssh myriad.rc.ucl.ac.uk source CMG/CMG_setup.sh
First attempt to configure a remote scheduler for Myriad is being done.
No joy - I keep getting:
Failed to Submit
errors without any further explanation about what is wrong.
CMG is using the Windows 10 supplied ssh command to communicate with Myriad. This may have a problem with the ssh key which was generated using PuTTY and hence causes the Failed to Submit messages. Investigating ...
Windows ssh needs keys generated by Windows ssh-keygen (just like on Linux). Doing this and adding the key to Myriad has allowed a job to be submitted from the CMG launcher running on the Windows box - PROGRESS!
I had to create a local ucspcmg account on my Windows VM to get it to work.
I've repeated the test from my username on the Windows box and the job submission works so no need for a local ucspcmg account.
Now need to work out how to pass correct job resources eg wall clock time, working directory etc to the job.
Although jobs can now be submitted, they fail because currently the compute nodes do not have ucspcmg configured with the Lustre Scratch as its home. Implementing this is now being worked on.
Also investigation if sshfs mounts on the Windows box can use ssh key identification instead of the password.
To configure ucspcmg correctly on the compute nodes an additional provisioning script needs to be written. This will be done by the RCI team after the new Legion storage is configured.
Unfortunately the RCI team have not been able to commit any time to producing the additional provisioning script yet because of the Legion and Thomas cluster outages and additional problems with the Myriad cluster.
The RCI team are testing the provisioning of the role account on the compute nodes. When this is working we can more on to the next stage of testing job submission from CMG.
I'm writing the script to add this user to Myriad
.
I have create script to add local users to Myriad compute:
#!/bin/bash -x
## See: https://github.com/UCL-RITS/rcps-buildscripts/issues/195 for the reason of this.
#
function add_local_user_passwd()
{
if [ -n "$1" ]
then
LUID=$(id -u $1)
LGID=$(id -g $1)
if [ ${LUID} -gt 0 ] && [ ${LGID} -gt 0 ]
then
echo "$1:x:${LUID}:${LGID}:Special Local user (IN03115615):/scratch/scratch/$1:/bin/bash"
fi
fi
}
function add_local_user()
{
if ! grep "^$1:" /etc/passwd.tmp > /dev/null
then
if [ -n "$1" ]
then
echo "$1:!!:17561::::::" >> ${SUB_SHADOW}
add_local_user_passwd $1 >> ${SUB_PASSWD}
fi
fi
}
# Main
SUB_PASSWD=$(mktemp)
SUB_SHADOW=$(mktemp)
# Generate new password/shodow files without special local accounts:
grep -v ":Special Local user (IN03115615):" /etc/passwd > /etc/passwd.tmp
grep -v ":Special Local user (IN03115615):" /etc/shadow > /etc/shadow.tmp
# Local User list
read -d '' Userlist <<EOL
ucspcmg
EOL
echo $Userlist | while read user
do
add_local_user $user
done
cat /etc/passwd.tmp > /etc/passwd.new && cat ${SUB_PASSWD} >> /etc/passwd.new
cat /etc/shadow.tmp > /etc/shadow.new && cat ${SUB_SHADOW} >> /etc/shadow.new
cp /etc/passwd /etc/passwd.bak && mv -f /etc/passwd.new /etc/passwd
cp /etc/shadow /etc/shadow.bak && mv -f /etc/shadow.new /etc/shadow
NP=$(cat /etc/passwd | wc -l)
OP=$(cat /etc/passwd.tmp | wc -l)
NS=$(cat /etc/shadow | wc -l)
OS=$(cat /etc/shadow.tmp | wc -l)
# See if /etc/passwd or /etc/shadow lengths are less than original, if so, copy original back - This means we cannot delete local accounts.
if [ "$NP" -lt "$OP" ] || [ "$NS" -lt "$OS" ]
then
cp /etc/passwd /etc/passwd.err
cp -f /etc/passwd.bak /etc/passwd
cp /etc/shadow /etc/shadow.err
cp -f /etc/shdow.bak /etc/shadow
fi
rm -f ${SUB_PASSWD}
rm -f ${SUB_SHADOW}
Does some limited checking, mainly to see if truncation has taken place on /etc/passwd
or /etc/shadow
, so at the moment cannot remove local special accounts.
Have run on all compute on Myriad and check random nodes and looks like it has worked.
Have also added to the postscripts on all compute, so it gets run when we re-install a compute node.
Progress towards getting a job to run - I've now got a job submitted from the CMG Launcher on Windows that starts running! Unfortunately it failed with the following error:
/var/opt/sge/node-h00a-028/job_scripts/891789: line 1: cmgsimrun: command not found
As mentioned above cmgsimrun is included within the Windows installer and has to be copied to Myriad. This was done but the cmgsimrun command has also to be made executable. I've now done this and added a prompt to the CMG build script as a reminder.
Still getting the following error:
/var/opt/sge/node-h00a-001/job_scripts/1004449: line 1: cmgsimrun: command not found
Module not being loaded in job script possibly.
Need to add setting PATH to the list of remote scheduler environment variables as the module is not being loaded. Trying :
CMG_HOME=/shared/ucl/apps/CMG/2017.101.GU
CMG_LIC_HOST=lic-cmg.ucl.ac.uk:2700
PATH=$CMG_HOME:$CMG_HOME/Launcher/2017.10/Linux_x64/EXE:$PATH
This wasn't right! So reverted to just setting:
CMG_HOME=/shared/ucl/apps/CMG/2017.101.GU
CMG_LIC_HOST=lic-cmg.ucl.ac.uk:2700
on the remote scheduler environment variables list.
I managed to examine one of the failing jobs while it was still in the queue (using qstat -j) and discovered:
shell_list: NONE:/bin/env
which should for normal jobs be:
shell_list: NONE:/bin/bash
This is why the:
/var/opt/sge/node-h00a-001/job_scripts/1004449: line 1: cmgsimrun: command not found
errors are occurring.
I've managed to fix this and get a job submitted and running successfully by configuring the Remote Scheduler from the Windows CMG Launcher:
Configuration -> Configure Remote Schedulers...
as follows. Note: including all my settings for reference:
\\sshfs\ucspcmg@myriad.rc.ucl.ac.uk
CMG_HOME=/shared/ucl/apps/CMG/2017.101.GU
CMG_LIC_HOST=lic-cmg.ucl.ac.uk:2700
-S /bin/bash
To submit the job I did the following after starting the CMG Launcher:
\\sshfs\ucspcmg@myriad.rc.ucl.ac.uk
Show Advanced Options and set:
-l mem=2G,h_rt=0:10:0
for 2GB RAM per core and 10 minutes wallclock time. Could add other resource requests here.OK to submit job to Myriad queue.
The Earth sciences department has requested that we install CMG201 on the research systems. In additon, @balston is setting up a license server for this package.