UCL-RITS / rcps-buildscripts

Scripts to automate package builds on RC Platforms
MIT License
39 stars 27 forks source link

CMG201 (IN:03115615) #195

Closed owainkenwayucl closed 5 years ago

owainkenwayucl commented 6 years ago

The Earth sciences department has requested that we install CMG201 on the research systems. In additon, @balston is setting up a license server for this package.

balston commented 6 years ago

OK the license manager is now installed on the UCL wide license server (lm-server3.ucl.ac.uk). My fist test from a linux VM attach via Eduroam was successful. I now need to test serving license to Legion. Which means I'm doing this install on there.

balston commented 6 years ago

I've done a test installation in my home to work out what the build script needs to do. It looks like it has installed correctly so I'm now producing a build script.

balston commented 6 years ago

Build script produced and pulled to Legion. Installation as follows. Using ccspapp:

cd /shared/ucl/apps/build_scripts
./cmg-2017.101_install

installs into /shared/ucl/apps/CMG/2017.101.GU

balston commented 6 years ago

Module file produced and pulled to Legion. Now running some simple tests.

balston commented 6 years ago

Ran the following test using supplied example data:

cd Scratch/CMG_examples
module load cmg/2017.101
cp $CMG_HOME/imex/2017.10/tpl/spe/*.dat .
RunSim.sh imex 2017.10 mxspe001.dat -doms -parasol 16

This runs successfully on a compute node via a qrsh session using:

-pe smp 16

The log on the license server shows:

0/10 12:23 (cmgl) OUT: solve_university v2017.10 by ccaabaa@node-u07a-020 MX201710 12C 10 (20 licenses)
10/10 12:23 (cmgl) OUT: solve_parallel v2017.10 by ccaabaa@node-u07a-020 MX201710 12C 10 (60 licenses)
10/10 12:23 (cmgl) IN: solve_university v2017.10 by ccaabaa@node-u07a-020 MX201710 12C 10 (20 licenses)
10/10 12:23 (cmgl) IN: solve_parallel v2017.10 by ccaabaa@node-u07a-020 MX201710 12C 10 (60 licenses)
balston commented 6 years ago

Output from the above example:

  LD_LIBRARY_PATH="/shared/ucl/apps/CMG/2017.101.GU/imex/2017.10/linux_x64/lib/:/shared/ucl/apps/CMG/2017.101.GU/stars/2017.10/linux_x64/lib:/shared/ucl/apps/CMG/2017.101.GU/gem/2017.10/linux_x64/lib:/shared/ucl/apps/CMG/2017.101.GU/imex/2017.10/linux_x64/lib:/shared/ucl/apps/intel/2018.Update3/impi/2018.3.222/intel64/lib:/shared/ucl/apps/intel/2018.Update3/debugger_2018/libipt/intel64/lib:/shared/ucl/apps/intel/2018.Update3/compilers_and_libraries_2018.3.222/linux/daal/../compiler/lib/intel64_lin:/shared/ucl/apps/intel/2018.Update3/compilers_and_libraries_2018.3.222/linux/daal/../tbb/lib/intel64_lin/gcc4.4:/shared/ucl/apps/intel/2018.Update3/compilers_and_libraries_2018.3.222/linux/daal/lib/intel64_lin:/shared/ucl/apps/intel/2018.Update3/compilers_and_libraries_2018.3.222/linux/tbb/lib/intel64/gcc4.4:/shared/ucl/apps/intel/2018.Update3/compilers_and_libraries_2018.3.222/linux/mkl/lib/intel64:/shared/ucl/apps/intel/2018.Update3/compilers_and_libraries_2018.3.222/linux/ipp/lib/intel64:/shared/ucl/apps/intel/2018.Update3/compilers_and_libraries_2018.3.222/linux/ipp/../compiler/lib/intel64:/shared/ucl/apps/intel/2018.Update3/compilers_and_libraries_2018.3.222/linux/mpi/intel64/lib:/shared/ucl/apps/intel/2018.Update3/compilers_and_libraries_2018.3.222/linux/compiler/lib/intel64:/shared/ucl/apps/giflib/5.1.1/gnu-4.9.2/lib:/shared/ucl/apps/subversion/1.8.13/lib:/shared/ucl/apps/apr-util/1.5.4/lib:/shared/ucl/apps/apr/1.5.2/lib:/shared/ucl/apps/git/2.10.2/gnu-4.9.2/lib64:/shared/ucl/apps/flex/2.5.39/gnu-4.9.2/lib:/shared/ucl/apps/gcc/4.9.2/lib:/shared/ucl/apps/gcc/4.9.2/lib64"
  /shared/ucl/apps/CMG/2017.101.GU/imex/2017.10/linux_x64/exe/mx201710.exe -f mxspe001.dat  -doms -parasol 16
 ********************************************************************************
 *                                                                              *
 *                                IMEX  2017.10                                 *
 *                   Adaptive - Implicit Black Oil Simulator                    *
 *                        General Release for Linux x64                         *
 *                            2017-Oct-17   12:06:12                            *
 *                                                                              *
 *                          (c) Copyright 1977 - 2017                           *
 *                Computer Modelling Group Ltd., Calgary, Canada                *
 *                             All Rights Reserved                              *
 *                                                                              *
 ********************************************************************************

 Command-line Arguments:  -f mxspe001.dat
                          -doms
                          -parasol 16

 Date and Time of Start of Run:  Oct 10, 2018  12:32:10

*** Input/Output files specification :
    Opened data file        on unit 10, filename is 'mxspe001.dat'
    Opened Scratch file     on unit 11

 ===================== WARNING (from subroutine: INDYNM) ======================
  Parasol specified using -parasol on command line 
  aimsol was in effect - overriding using -parasol.
 ==============================================================================

 ===================== WARNING (from subroutine: INDYNM) ======================
  INAC-WELL-SO not specifed with PARASOL in effect.
  Defaulting to INAC-WELL-SO ON and continuing.
 ==============================================================================

    Opened output file      on unit 12, filename is 'mxspe001.out'
    Opened SR3-OUT          on unit 13, filename is 'mxspe001.sr3'
    Opened INDEX-OUT        on unit 14, filename is 'mxspe001.irf'
    Opened MAIN-RESULTS-OUT on unit 15, filename is 'mxspe001.mrf'
    Opened Restart-SR3      on unit 16, filename is 'mxspe001.rstr.sr3'
    Opened Restart-NDX      on unit 17, filename is 'mxspe001.rstr.irf'
    Opened Restart-BIN      on unit 18, filename is 'mxspe001.rstr.mrf'
    Opened GRID scratchfile on unit 19

 ===================== WARNING (from subroutine: INCOMP) ======================
  P/Z Output is disabled as Temperatures were not defined for all PVT Regions
  Please review the *TRES Keyword
  Line: 113   File Name: mxspe001.dat                                           
 ==============================================================================

 ===================== WARNING (from subroutine: PVTCHK) ======================
  PVT Table 1 Entry (i=) 9
  fails compressibility check Bg(i+1)*dRs/Dp(i) must be > dBo/Dp(i)
 ==============================================================================

 ===================== WARNING (from subroutine: PVTCHK) ======================
  IMEX has detected negative total compressibility
  Please Correct Above Entries to avoid poor performance
 ==============================================================================

 ===================== WARNING (from subroutine: PSSLDS) ======================
  Chosen direction has too few slabs containing active blocks (10) to support *AUTOPSLAB 16 ;
  Treating the partition as *AUTOPSLAB 5 .
 ==============================================================================

 ===================== SUMMARY (from subroutine: INDATA) ======================
  Reading of initial data is complete.
  Simulation will stop if there were error messages.
     4 Warning messages.    0 Error messages.
 ==============================================================================

 ===================== WARNING (from subroutine: WMCHCK) ======================
  WELL Injector is an unweighted injector with XFLOW-MODEL FULLY-MIXED.
  The current formulation does not allow this. Resetting to XFLOW-MODEL ZERO-FLOW.
 ==============================================================================

                                    I M E X   T I M E   S T E P   S U M M A R Y
                                       First SPE Comparative Solution Project
                                   (See Odeh, A.S., J.P.T., 33, pp.13-25, 1981.)
 ==================================================================================================================
    Time Step           Time                  Total Production             Total Injection  Total Max. Max. Change
 ---------------- ---------------- -------------------------------------- -----------------  PV   Mat. ------------
                C                    Oil      Gas     Water   GOR.  Wat.    Gas     Water   Avg.  bal. Satur. Pres.
        Size    U                                             SCF/   Cut                    Pres. err. DSMAX  DPMAX
  No.   Days IT T Days   yy:mm:dd   STB/D    MCF/D    STB/D    STB    %    MCF/D    STB/D   psia   %          psia
 ------ ---- -- - ----- ---------- -------- -------- -------- ----- ----- -------- -------- ----- ---- ------ -----
     1w 1.00 1  0 1.000 1986:04:23 20000.00 25400.00           1270       100000.0           4795 e-4g  2e-4o   505
     2  1.98 1  0 2.979 1986:04:25 20000.00 25400.00           1270       100000.0           4798 e-3g  1e-4o   957
     3  2.07 1  0 5.047 1986:04:27 20000.00 25400.00           1270       100000.0           4803 e-3g  .087g   281
     4  4.77 1  0  9.82 1986:05:02 20000.00 25400.00           1270       100000.0           4813 e-3g  .115g   376
     5  8.31 1  0 18.12 1986:05:10 20000.00 24894.84           1245       100000.0           4824 .04g -.044o  1154
     6  7.20 1  0 25.32 1986:05:17 20000.00 24665.08           1233       100000.0           4842 .05g  .102g   748
     7   9.6 1  0 34.94 1986:05:27 20000.00 24403.16           1220       100000.0           4866 .05g  .093g   600
     8  16.0 2  0 50.96 1986:06:12 20000.00 24194.71           1210       100000.0           4907 .05g  .120g   217
     9  26.6 2  0 77.60 1986:07:09 20000.00 24318.05           1216       100000.0           4963 .05g  .126g  1149
    10  23.2 2  0 100.8 1986:08:01 20000.00 24421.49           1221       100000.0           5019 .05g  .142g   884
    11  26.2 2  0 127.0 1986:08:27 20000.00 24576.09           1229       100000.0           5081 .05g  .132g  1021
    12  25.7 2  0 152.7 1986:09:22 20000.00 24872.83           1244       100000.0           5138 .06g  .113g  1028
    13  25.0 2  0 177.7 1986:10:17 20000.00 25191.66           1260       100000.0           5195 .06g  .119g   579
    14  42.1 2  0 219.8 1986:11:28 20000.00 25471.18           1274       100000.0           5278 .09g  .114g  1653
    15  25.5 1  0 245.2 1986:12:23 20000.00 25429.72           1271       100000.0           5336 .09g  .123g   657
    16  38.7 2  0 284.0 1987:01:31 20000.00 25420.66           1271       100000.0           5413 .10g  .132g  1258
    17  30.8 1  0 314.7 1987:03:03 20000.00 25469.43           1273       100000.0           5480 .16g  .116g   952
    18  32.3 2  0 347.1 1987:04:04 20000.00 25550.60           1278       100000.0           5548 .16g  .092g   880
    19  36.7 2  0 383.8 1987:05:11 20000.00 25604.53           1280       100000.0           5623 .16g  .101g   474
    20  72.9 3  0 456.7 1987:07:23 20000.00 25616.16           1281       100000.0           5756 .17g  .127g  1080

                                    I M E X   T I M E   S T E P   S U M M A R Y
                                       First SPE Comparative Solution Project
                                   (See Odeh, A.S., J.P.T., 33, pp.13-25, 1981.)
 ==================================================================================================================
    Time Step           Time                  Total Production             Total Injection  Total Max. Max. Change
 ---------------- ---------------- -------------------------------------- -----------------  PV   Mat. ------------
                C                    Oil      Gas     Water   GOR.  Wat.    Gas     Water   Avg.  bal. Satur. Pres.
        Size    U                                             SCF/   Cut                    Pres. err. DSMAX  DPMAX
  No.   Days IT T Days   yy:mm:dd   STB/D    MCF/D    STB/D    STB    %    MCF/D    STB/D   psia   %          psia
 ------ ---- -- - ----- ---------- -------- -------- -------- ----- ----- -------- -------- ----- ---- ------ -----
    21  67.5 2  0 524.2 1987:09:28 20000.00 25544.49           1277       100000.0           5880 .17g  .126g  1500
    22  45.0 1  0 569.3 1987:11:12 20000.00 25489.50           1274       100000.0           5963 .19g  .136g  1168
    23  38.5 1  0 607.8 1987:12:21 20000.00 25458.63           1273       100000.0           6032 .20g  .119g   736
    24  52.4 2  0 660.2 1988:02:11 20000.00 25434.24           1272       100000.0           6119 .19g  .087g   661
    25  79.2 2  0 739.4 1988:04:30 20000.00 25416.45           1271       100000.0           6243 .19g  .108g  1156
    26  68.5 3  0 807.9 1988:07:08 20000.00 25408.40           1270       100000.0           6347 .18g  .108g  1422
    27  48.2 3  0 856.1 1988:08:25 20000.00 25405.02           1270       100000.0           6418 .18g  .109g   627
    28  76.8 4  0 932.9 1988:11:10 20000.00 25402.42           1270       100000.0           6527 .18g  .119g  1269
    29  60.5 2  0 993.4 1989:01:09 20000.00 25401.37           1270       100000.0           6614 .18g  .081g   576
    30   105 6  0  1098 1989:04:24 20000.00 26718.62           1336       100000.0           6750 .16g  .147g  1422
    31  73.8 5  0  1172 1989:07:07 20000.00 28982.48           1449       100000.0           6838 .16g  .097g   939
    32  39.3 3  1  1212 1989:08:16 20000.00 31590.09           1580       100000.0           6880 .15g  .072g   461
    33  78.6 4  0  1290 1989:11:02 20000.00 52177.10           2609       100000.0           6909 .14o -.100o   680
    34   116 3  0  1406 1990:02:26 20000.00 108404.5           5420       100000.0           6715 .14o -.077o   990
    35   117 2  0  1522 1990:06:22 20000.00 141366.1           7068       100000.0           6369 .15o -.107o   770
    36   152 5  0  1674 1990:11:21 20000.00 177732.3           8887       100000.0           5757 .15o -.108o  -742
    37   204 5  0  1878 1991:06:13 13946.74 132876.3           9527       100000.0           5370 .15o -.136o  -469
    38   300 6  0  2178 1992:04:08 11462.75 129304.9          11280       100000.0           4924 .16o -.142o  -495
    39   424 5  0  2602 1993:06:06 9623.877 123096.3          12791       100000.0           4509 .16o -.193o  -457
    40   438 5  0  3041 1994:08:19 8298.566 119126.4          14355       100000.0           4225 .15o -.146o  -309

                                    I M E X   T I M E   S T E P   S U M M A R Y
                                       First SPE Comparative Solution Project
                                   (See Odeh, A.S., J.P.T., 33, pp.13-25, 1981.)
 ==================================================================================================================
    Time Step           Time                  Total Production             Total Injection  Total Max. Max. Change
 ---------------- ---------------- -------------------------------------- -----------------  PV   Mat. ------------
                C                    Oil      Gas     Water   GOR.  Wat.    Gas     Water   Avg.  bal. Satur. Pres.
        Size    U                                             SCF/   Cut                    Pres. err. DSMAX  DPMAX
  No.   Days IT T Days   yy:mm:dd   STB/D    MCF/D    STB/D    STB    %    MCF/D    STB/D   psia   %          psia
 ------ ---- -- - ----- ---------- -------- -------- -------- ----- ----- -------- -------- ----- ---- ------ -----
    41   609 6  0  3650 1996:04:19 6810.896 119149.3          17494       100000.0           3960 .14o -.174o  -284

 ===================== SUMMARY (from subroutine: TSIO)   ======================
  Simulation run terminated. Stopping time reached.

     5 Warning messages.    0 Error messages.
 ==============================================================================

       Field Total                             Fluid
                             Oil       Gas       Water    Solvent   Polymer 
                            -------   -------   -------   -------   -------
                            (MSTB)    (MMSCF)   (MSTB)    (MMSCF)   (MLB) 
  Cumulative Production       51635    334363      0        NA        NA   
  Cumulative Injection        NA       365000      0        NA        NA   
  Cumulative Gas Lift         NA         0        NA        NA        NA   
  Cumulative Water Influx     NA        NA         0        NA        NA   
  Current Fluids In Place    233109    392060     63382     NA        NA   
  Production Rates           6.8109    119.15      0        NA        NA   
  Injection Rates             NA       100.00      0        NA        NA   

  Timesteps:     41  Newton Cycles:      110  Cuts:      1  Solver Iterations:         1302
  Average Implicitness    : 0.305
  Material Balances (owg ): 0.999 1.000 0.999
  Average Active Blocks:         300 Average Non-BHP Active Wells:     2
  Total Blocks :         300 Total Wells :     2
  Active Blocks:         300 Non-BHP Active Wells:     1
  Total lstpro/lstpar calls:     2
  Time at end of simulation:   3650.00     (days)   
  Average reservoir pressure excluding water zone: 3960.361 (psi)    
  Total Number of Solver Failures:       0  Stalls:       0  ITERMAX Reached:       0
  Jacobian Domains      3
  Linear Solver: Parasol
  Number of level 1 solver classes:    5
  AUTOPSLAB Cutting Direction:    J
  Preconditioner Ordering     REDBLACK
  Preconditioner Degree aa, ab, as, bb, ba:    1    2    1    1    1
  KMP_AFFINITY: Default
  OMP_SCHEDULE: Default
  Max Impl Blocks:      205  %Impl: 68.3%  (TS,CUT,NCYC): (   41, 0,  3 )
  Max Solver Iterations (TS,CUT,NCYC):  26 (   40, 0,  1 )
  Number of threads set:   16
  Total number of cpus:   16
  Number of threads for grid module:   16
  Memory Usage Peak:       39 MB on TS:   26  TS 1 Peak:       36 MB  Average:       38 MB  VM Size:     1251 MB
  Memory Usage Final Size:         40 MB
  Host computer: node-u07a-020

  End of Simulation: Normal Termination

  CPU Time:            16.18 seconds
  Elapsed Time:         1.19 seconds

  Date and Time of End of Run:  Oct 10, 2018  12:32:11

Done RunSim.
balston commented 6 years ago

Module file has been updated to use the DNS alias lic-cmg.ucl.ac.uk for accessing the license manager. I've rerun the test simulations to make sure everything is still working.

balston commented 6 years ago

Putting together an example job script. Have a test job in the queue at the moment,

heatherkellyucl commented 5 years ago

IN:03115615 requesting remote job submission from Windows Launcher.

To CMG on Legion, I need to establish a connection from my PC to Legion and CMG will submit the job itself. I can submit single jobs, but I need to run optimisation job which requires many jobs and then communicating with optimiser which works based on Launcher.

PDF about it attached to ticket.

balston commented 5 years ago

The CMG supplied documentation for remote job submission "Accessing an Oracle GRID Engine Cluster" includes the following requirement: shared directory

I'm investigating if this is possible with our environment. We also need to ask CMG if they support remote submission where the job directory is not shared between the local Windows and remote Linux systems (ie like MathWorks does for Matlab).

balston commented 5 years ago

We are going to experiment with using WinSshFS (https://github.com/feo-cz/win-sshfs) to export user Scratch to Windows 10. We will test using a Windows 10 VM on a Mac to have a similar environment to the users.

balston commented 5 years ago

Note: must install CMG on Myriad before testing the Windows integration.

balston commented 5 years ago
balston commented 5 years ago

I've installed WinSSHFS on my Windows 10 VM together with the recommended version of Dorcan. I've not been able to make it work yet.

balston commented 5 years ago

The error is was getting yesterday was:

error

using configuration:

config

balston commented 5 years ago

Same error when using a real Windows 10 PC.

balston commented 5 years ago

Made some progress this afternoon. Using different set of software at the Windows end:

1) get this

https://github.com/mhogomchungu/sirikali/releases/tag/1.3.6

sirkali-setup.exe

2) get this

https://github.com/billziss-gh/sshfs-win/releases/tag/v2.7.17334

i got the 64bit msi

3) get this

https://github.com/billziss-gh/winfsp/releases/tag/v1.4

just the msi.

Installed in that order. Went to explorer, map network drive, used

\\sshfs\yourRemoteLogin@remoteComputer

as per

https://codeyarns.com/2018/05/03/how-to-mount-remote-directory-on-windows-using-sshfs-win/

We can now mount user's Myriad home directories on the windows box but still cannot access Scratch. Thanks @uclWerner for sorting it out.

balston commented 5 years ago

I've repeated the test on my Windows 10 VM running on my iMac and I can also mount my Myriad home directory on it.

balston commented 5 years ago

We still need to sort out how to access Scratch as the symbolic link ~/Scratch dosen't work at the Windows end. We will need to have more discussions about this issue next week.

balston commented 5 years ago

We are going to try the following to get access to Lustre Scratch:

balston commented 5 years ago

I'm getting an error when trying to login to the ucspcmg account on Myriad:

 Authentication service cannot retrieve authentication info

It works with the genpw password on Aristotle.

balston commented 5 years ago

ucspcmg now working for normal login - needed to be added to the shadow files.

balston commented 5 years ago

Progress - mounting works and files can be read and written from the Windows 10 VM and can be seen on Myriad login nodes.

Next thing:

balston commented 5 years ago

A basic test job has been submitted:

job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
 566647 0.00000 CMG-SMP-jo ucspcmg      qw    02/04/2019 15:18:34                                   12
balston commented 5 years ago

I forgot to change the working directory to:

/scratch/scratch/ucspcmg/CMG_examples

Updated and resubmitted.

balston commented 5 years ago

This time the job started but appears to have no modules loaded. Adding loading default modules .and resubmitting.

balston commented 5 years ago

Test CMG job worked. Now following: CMG run.pdf

to try and set up remote job submission from my VM.

balston commented 5 years ago

The CMG provided instructions (above) tell you to install the cmgsimrun command on the cluster. This is not available with the Linux install archive. In fact cmgsimrun is included within the Windows installer!

So we need to copy the following folder after installing the Windows version:

C:\Program Files (x86)\CMG\Launcher\2017.10\Linux_x64

to Myriad and Legion in:

/shared/ucl/apps/CMG/2017.101.GU/Launcher/2017.10/
balston commented 5 years ago

I've added a prompt to do the copy from Windows to the build script.

_C:\Program Files (x86)\CMG\Launcher\2017.10\Linuxx64 installed on both Legion and Myriad.

balston commented 5 years ago

Module file updated to add:

/shared/ucl/apps/CMG/2017.101.GU/Launcher/2017.10/Linux_x64/EXE

to the PATH. Test job submitted and run successfully to check that the module file still works.

balston commented 5 years ago

I think the configuration on the Myriad end is done. Now to try setting up on Windows 10.

balston commented 5 years ago

One more thing on the Myriad side! ~ucspcmg/CMG/CMG_setup.sh created to load module file. This is sources during ssh connection. On the Windows side:

ssh myriad.rc.ucl.ac.uk source CMG/CMG_setup.sh
balston commented 5 years ago

First attempt to configure a remote scheduler for Myriad is being done.

balston commented 5 years ago

No joy - I keep getting:

Failed to Submit

errors without any further explanation about what is wrong.

balston commented 5 years ago

CMG is using the Windows 10 supplied ssh command to communicate with Myriad. This may have a problem with the ssh key which was generated using PuTTY and hence causes the Failed to Submit messages. Investigating ...

balston commented 5 years ago

Windows ssh needs keys generated by Windows ssh-keygen (just like on Linux). Doing this and adding the key to Myriad has allowed a job to be submitted from the CMG launcher running on the Windows box - PROGRESS!

I had to create a local ucspcmg account on my Windows VM to get it to work.

balston commented 5 years ago

I've repeated the test from my username on the Windows box and the job submission works so no need for a local ucspcmg account.

Now need to work out how to pass correct job resources eg wall clock time, working directory etc to the job.

balston commented 5 years ago

Although jobs can now be submitted, they fail because currently the compute nodes do not have ucspcmg configured with the Lustre Scratch as its home. Implementing this is now being worked on.

Also investigation if sshfs mounts on the Windows box can use ssh key identification instead of the password.

balston commented 5 years ago

To configure ucspcmg correctly on the compute nodes an additional provisioning script needs to be written. This will be done by the RCI team after the new Legion storage is configured.

balston commented 5 years ago

Unfortunately the RCI team have not been able to commit any time to producing the additional provisioning script yet because of the Legion and Thomas cluster outages and additional problems with the Myriad cluster.

balston commented 5 years ago

The RCI team are testing the provisioning of the role account on the compute nodes. When this is working we can more on to the next stage of testing job submission from CMG.

ccaathj commented 5 years ago

I'm writing the script to add this user to Myriad.

ccaathj commented 5 years ago

I have create script to add local users to Myriad compute:

#!/bin/bash -x

## See: https://github.com/UCL-RITS/rcps-buildscripts/issues/195 for the reason of this.
#

function add_local_user_passwd()
{
    if [ -n "$1" ]
    then
        LUID=$(id -u $1)
        LGID=$(id -g $1)
        if [  ${LUID} -gt 0 ] && [ ${LGID} -gt 0 ]
        then
            echo "$1:x:${LUID}:${LGID}:Special Local user (IN03115615):/scratch/scratch/$1:/bin/bash"
        fi
    fi
}

function add_local_user()
{

    if ! grep "^$1:" /etc/passwd.tmp > /dev/null
    then
        if [ -n "$1" ]
        then
            echo "$1:!!:17561::::::" >> ${SUB_SHADOW}
            add_local_user_passwd $1 >> ${SUB_PASSWD}
        fi
    fi
}

# Main
SUB_PASSWD=$(mktemp)
SUB_SHADOW=$(mktemp)

# Generate new password/shodow files without special local accounts:
grep -v ":Special Local user (IN03115615):" /etc/passwd > /etc/passwd.tmp
grep -v ":Special Local user (IN03115615):" /etc/shadow > /etc/shadow.tmp

# Local User list
read -d '' Userlist <<EOL
ucspcmg
EOL

echo $Userlist | while read user
do
add_local_user $user
done

cat /etc/passwd.tmp > /etc/passwd.new && cat ${SUB_PASSWD}  >> /etc/passwd.new
cat /etc/shadow.tmp > /etc/shadow.new && cat ${SUB_SHADOW}  >> /etc/shadow.new

cp /etc/passwd /etc/passwd.bak && mv -f /etc/passwd.new /etc/passwd
cp /etc/shadow /etc/shadow.bak && mv -f /etc/shadow.new /etc/shadow

NP=$(cat /etc/passwd | wc -l)
OP=$(cat /etc/passwd.tmp | wc -l)
NS=$(cat /etc/shadow | wc -l)
OS=$(cat /etc/shadow.tmp | wc -l)

# See if /etc/passwd or /etc/shadow lengths are less than original, if so, copy original back - This means we cannot delete local accounts.
if [ "$NP" -lt "$OP" ] || [ "$NS" -lt "$OS" ]
then
    cp /etc/passwd /etc/passwd.err
    cp -f /etc/passwd.bak /etc/passwd
    cp /etc/shadow /etc/shadow.err
    cp -f /etc/shdow.bak /etc/shadow
fi

rm -f ${SUB_PASSWD}
rm -f ${SUB_SHADOW}

Does some limited checking, mainly to see if truncation has taken place on /etc/passwd or /etc/shadow, so at the moment cannot remove local special accounts.

Have run on all compute on Myriad and check random nodes and looks like it has worked.

ccaathj commented 5 years ago

Have also added to the postscripts on all compute, so it gets run when we re-install a compute node.

balston commented 5 years ago

Progress towards getting a job to run - I've now got a job submitted from the CMG Launcher on Windows that starts running! Unfortunately it failed with the following error:

/var/opt/sge/node-h00a-028/job_scripts/891789: line 1: cmgsimrun: command not found

As mentioned above cmgsimrun is included within the Windows installer and has to be copied to Myriad. This was done but the cmgsimrun command has also to be made executable. I've now done this and added a prompt to the CMG build script as a reminder.

balston commented 5 years ago

Still getting the following error:

/var/opt/sge/node-h00a-001/job_scripts/1004449: line 1: cmgsimrun: command not found

Module not being loaded in job script possibly.

balston commented 5 years ago

Need to add setting PATH to the list of remote scheduler environment variables as the module is not being loaded. Trying :

CMG_HOME=/shared/ucl/apps/CMG/2017.101.GU
CMG_LIC_HOST=lic-cmg.ucl.ac.uk:2700
PATH=$CMG_HOME:$CMG_HOME/Launcher/2017.10/Linux_x64/EXE:$PATH
balston commented 5 years ago

This wasn't right! So reverted to just setting:

CMG_HOME=/shared/ucl/apps/CMG/2017.101.GU
CMG_LIC_HOST=lic-cmg.ucl.ac.uk:2700

on the remote scheduler environment variables list.

balston commented 5 years ago

I managed to examine one of the failing jobs while it was still in the queue (using qstat -j) and discovered:

shell_list:                 NONE:/bin/env

which should for normal jobs be:

shell_list:                 NONE:/bin/bash

This is why the:

/var/opt/sge/node-h00a-001/job_scripts/1004449: line 1: cmgsimrun: command not found

errors are occurring.

I've managed to fix this and get a job submitted and running successfully by configuring the Remote Scheduler from the Windows CMG Launcher:

Configuration -> Configure Remote Schedulers...

as follows. Note: including all my settings for reference:

balston commented 5 years ago

To submit the job I did the following after starting the CMG Launcher:

Show Advanced Options and set:

OK to submit job to Myriad queue.