GsDevKit / GsDevKit_home

master GsDevKit project
http://gsdevkit.github.io/GsDevKit_home
MIT License
31 stars 36 forks source link

MessageNotUnderstood: receiver of "," is nil when startNetldi (likely $USER not defined) #203

Open victornoel opened 6 years ago

victornoel commented 6 years ago

Hi,

I'm trying to install GsDevKit_Home and during the execution of createStone devKit_33 3.3.0 (it is the same with 3.4), I get the following error:

=================
   GsDevKit script: startStone -b -N devKit_33
              path: /GsDevKit_home/bin/startStone
=================
 _____________________________________________________________________________
|             GemStone/S64 Object-Oriented Data Management System             |
|                   Copyright (C) GemTalk Systems 1986-2016                   |
|                            All rights reserved.                             |
+-----------------------------------------------------------------------------+
|    PROGRAM: WAITSTONE, GemStone Remote Process Utility                      |
|    VERSION: 3.3.0, Thu Jan 28 12:05:34 2016                                 |
|      BUILD: gss64_3_3_x_branch-38643                                        |
|  BUILT FOR: x86-64 (Linux)                                                  |
|       MODE: 64 bit                                                          |
| RUNNING ON: 4-CPU afff5849a211 x86_64 (Linux 4.14.5-1-ARCH #1 SMP PREEMPT Sun
| Dec 10 14:50:30 UTC 2017) 7885MB                                            |
| PROCESS ID: 274       DATE: 12/17/17 12:31:37 UTC                           |
|   USER IDS: REAL=root (0) EFFECTIVE=root (0)                                |
+-----------------------------------------------------------------------------+
|   GEMSTONE_NRS_ALL = #dir:$GEMSTONE_LOGDIR#log:%N%P.log                     |
|_____________________________________________________________________________|
waitstone[Error]: Network service !#dir:/GsDevKit_home/server/stones/devKit_33/logs#log:%N%P.log#server!devKit_33 was not found.
Network lookup failure; could not find server 'devKit_33' on host 'afff5849a211' because file not found: /opt/gemstone/locks/devKit_33..LCK;  service devKit_33 not found ; NetLDI service 'gs64ldi' not found on node 'localhost6' port 50377 :

Starting stone: devKit_33
--- 12/17/17 12:31:38.248 UTC ---
stopstone[Info]: GemStone version '3.3.0'
stopstone[Info]: Server 'devKit_33' is not running.
startstone[Info]: GemStone version '3.3.0'
startstone[Info]: Starting Stone repository monitor devKit_33.
startstone[Info]: GEMSTONE is: '/GsDevKit_home/server/stones/devKit_33/product'.
startstone[Info]: GEMSTONE_NRS_ALL is: '#dir:$GEMSTONE_LOGDIR#log:%N%P.log'.
startstone[Info]:  Ignoring  GEMSTONE_NRS_ALL for stone
startstone[Info]:
    GEMSTONE_SYS_CONF=/GsDevKit_home/server/stones/devKit_33/extents/system.conf
    GEMSTONE_EXE_CONF=/GsDevKit_home/server/stones/devKit_33/devKit_33.conf
startstone[Info]: Log file is '/GsDevKit_home/server/stones/devKit_33/logs/devKit_33.log'.
startstone[info]: GemStone server devKit_33 has been started, process 295

=================
   GsDevKit script: startNetldi devKit_33
              path: /GsDevKit_home/bin/startNetldi
=================
starting netldi devKit_33 

MessageNotUnderstood: receiver of "," is nil
UndefinedObject(Object)>>doesNotUnderstand: #,
UnixProcess>>processProxy:forkAndExec:arguments:environment:descriptors: in Block: [ :e | e , nullString ]
Array(SequenceableCollection)>>collect:
UnixProcess>>processProxy:forkAndExec:arguments:environment:descriptors:
ExternalUnixOSProcess>>forkChild
ExternalUnixOSProcess class>>forkAndExec:arguments:environment:descriptors:
UnixProcess>>forkAndExec:arguments:environment:descriptors:
UnixProcess>>forkJob:arguments:environment:descriptors:
GsDevKitStartnetldiCommandLineHandler class(GsDevKitAbstractCommandLineHandler class)>>runShellCommand:args:noError:
GsDevKitStartnetldiCommandLineHandler(GsDevKitAbstractCommandLineHandler)>>runShellCommand:args:
GsDevKitStartnetldiCommandLineHandler>>activate
GsDevKitStartnetldiCommandLineHandler class(CommandLineHandler class)>>activateWith:
PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand: in Block: [ aCommandLinehandler activateWith: commandLine ]
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block: [ self handleArgument: (self arguments ifEmpty: [ ...etc...
BlockClosure>>on:do:
PharoCommandLineHandler(BasicCommandLineHandler)>>activate
PharoCommandLineHandler>>activate
PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
PharoCommandLineHandler class>>activateWith: in Block: [ super activateWith: aCommandLine ]
WorldState>>runStepMethodsIn:
WorldMorph>>runStepMethods
WorldState>>doOneCycleNowFor:
WorldState>>doOneCycleFor:
WorldMorph>>doOneCycle
MorphicUIManager>>spawnNewProcess in Block: [ ...
BlockClosure>>newProcess in Block: [ ...
Error on or near line 99 :: devKitCommandLine startnetldi devKit_33 :: devKitCommandLine startnetldi devKit_33
Error on or near line 72 :: startNetldi devKit_33 :: startNetldi devKit_33
Error on or near line 139 :: startStone -b -N devKit_33 :: startStone -b -N devKit_33
Error on or near line 169 :: newExtent -s /GsDevKit_home/server/stones/devKit_33/product/bin/extent0.seaside.dbf devKit_33 :: newExtent -s /GsDevKit_home/server/stones/devKit_33/product/bin/extent0.seaside.dbf devKit_33
Error on or near line 209 :: createStone devKit_33 3.3.0 :: createStone devKit_33 3.3.0

I'm not sure what is causing this, if you have an idea what I could try to look for, I can maybe help find the source of the problem.

victornoel commented 6 years ago

btw I know I'm running in root and I shouldn't, but I tried also as a normal user and it doesn't seem to be related.

Note that I setup the prerequisites by hand since I'm not on one of the supported distributions (I'm actually trying to build a docker image that can be accessed via X11 forwarding, I will contribute it to the project when it works).

victornoel commented 6 years ago

~ok, one reason that could explain this is that GsDevKit_home/server/stones/devKit_34/bin/ is empty and I suppose it should contain the binary startnetldi.~ edit: scratch that, I misunderstood how things were working.

victornoel commented 6 years ago

Suprisingly, while trying to debug it, I ran the installation in an environment with an X session (which wasn't the case before) and the error didn't appear.

I suppose there is some hidden dependency to having an X session available. any way to not to? Since I'm trying to build a docker image... thanks!

dalehenrich commented 6 years ago

Welll devKitCommandLine is a headless pharo image so perhaps pharo requires X libraries to be installed whether or not it is running headless ... or it's possible that even in headless mode it hits an X server ... It might be worth asking on the pharo list about what prerequisites are needed to run a headless pharo image .. I'll see if I can figure out where the MNU is coming from and perhaps all it takes is some bullet-proofing of the code

dalehenrich commented 6 years ago

Okay the fact that the error is occuring during Array>>collect: implies that the following chunk of code is being executed:

        ifFalse:
            [args := (OrderedCollection new: arrayOfStrings size + 2)
                        add: progName;
                        addAll: (arrayOfStrings collect: [:e | e, nullString ]);    "Null terminate each string"
                        yourself;
                        asArray].

and the arrayOfStrings should be the netldi arguments .... the most likely culprit is TDSessionDescription>>netldiArgsOn::

netldiArgsOn: netldiArgs
    netldiArgs
        add: '-g';
        add: '-a';
        add: self osUserId.
    self netLDIPort isEmpty
        ifFalse: [ 
            netldiArgs
                add: '-P';
                add: self netLDIPort ].
    self netLDIPortRange
        ifNotNil: [ :range | 
            | vrsnAr |
            vrsnAr := self gemstoneVersion findTokens: '.'.
            ((vrsnAr at: 1) asNumber < 3 or: [ (vrsnAr at: 1) asNumber = 3 and: [ (vrsnAr at: 2) asNumber < 2 ] ])
                ifTrue: [ 
                    "after GemStone 3.2, port range no longer accepted by netldi"
                    range isEmpty
                        ifFalse: [ 
                            netldiArgs
                                add: '-p';
                                add: range ] ]
                ifFalse: [ 
                    Transcript
                        cr;
                        show: 'port range: ' , range printString , ' no longer needed for netldi in GemStone versions 3.2 and later' ] ].
    netldiArgs add: self netLDI

and other than the oSUserId message all other arguments are guaranteed to be not NIL. osUserId could return NIL if the USER env var is not set ... I think:

osUserId
    osUserId == nil
        ifTrue: [ ^ OSPlatform current environment getEnv: 'USER' ].
    ^ osUserId

BTW, the -D flag can be set in scripts for a call to devKitCommandLine...the flag causes the pharo vm to come up head full and if an error occurs during processing a debugger will come up ... also if there are no errors you can manually exit the image for processing to continue ... if you run devKitCommandLine -D on the bash command line you can come up with an interactive session of the image to read code (which is what I did here...

victornoel commented 6 years ago

BTW, the -D flag can be set in scripts for a call to devKitCommandLine

That's actually how I realised that it was working when using a X environment, I wanted to call it with -D and debug it.

When I get some time, I will try to see if the problem is from the $USER variable or if OSPlatform is requiring X to be present. Thanks!

dassi commented 4 years ago

Just ran into the same issue, when starting netldi (via startNetldi command) from monit (Linux monitoring tool, see: https://mmonit.com/monit/documentation/monit.html). monit does not set the USER environment variable, nor does "bash --login". I tracked it down for hours...

Workaround for monit and other low-level command executing tools

Set USER environment variable in the start command manually (e.g. USER=ubuntu /home/ubuntu/script_that_starts_netldi.sh)

Suggestions for GsDevKit

Don't rely on the environment to have USER been set in the osUserId method.

dalehenrich commented 4 years ago

@dassi ... that makes a lot of sense ... thanks for the detective work ...