GsDevKit / gsApplicationTools

This repository includes scripts and code that allow a more convenient setup of a Gemstone server application
MIT License
2 stars 2 forks source link

Inifinite loop in ZnNewGemServer >> basicServerOn: #29

Closed dalehenrich closed 9 years ago

dalehenrich commented 9 years ago

@JupiterJones reported the following in his log file ... scads and scads of this:

----------- FlowAppServer MessageNotUnderstood exception encountered: 2015-02-07T19:42:20.1227540969848-08:00
a MessageNotUnderstood occurred (error 2010), a UndefinedObject does not understand  #'isValid'
1 ZnNewGemServer (GemServer) >> writeGemLogEntryFor:titled: @3 line 3  [GsNMethod 189036801]
2 ZnNewGemServer (GemServer) >> logStack:titled:inTransactionDo: @2 line 2  [GsNMethod 189021953]
3 ZnNewGemServer (GemServer) >> logStack:titled: @2 line 2  [GsNMethod 189030401]
4 ZnNewGemServer (GemServer) >> gemServerHandleErrorException: @9 line 6  [GsNMethod 189024257]
5 MessageNotUnderstood (Error) >> exceptionHandlingForGemServer: @2 line 2  [GsNMethod 184258561]
6 ZnNewGemServer (GemServer) >> handleGemServerException: @2 line 5  [GsNMethod 189028353]
7 [] in  ExecBlock0 (GemServer) >> gemServer:exceptionSet:beforeUnwind:ensure: @3 line 9  [GsNMethod 210501377]
8 ExecBlock0 (ExecBlock) >> on:do: @3 line 42  [GsNMethod 3187969]
9 [] in  ExecBlock1 (GemServer) >> gemServer:exceptionSet:beforeUnwind:ensure: @2 line 11  [GsNMethod 208762369]
10 MessageNotUnderstood (AbstractException) >> _executeHandler: @3 line 8  [GsNMethod 3998209]
11 MessageNotUnderstood (AbstractException) >> _signalWith: @1 line 1  [GsNMethod 4011777]
12 MessageNotUnderstood (AbstractException) >> signal @2 line 47  [GsNMethod 4015873]
13 UndefinedObject (Object) >> doesNotUnderstand: @9 line 10  [GsNMethod 4084225]
14 UndefinedObject (Object) >> _doesNotUnderstand:args:envId:reason: @7 line 12  [GsNMethod 4078849]
15 [] in  ExecBlock0 (ZnNewGemServer) >> basicServerOn: @3 line 12  [GsNMethod 267351809]
16 ExecBlock0 (ExecBlock) >> on:do: @3 line 42  [GsNMethod 3187969]
17 [] in  ExecBlock0 (GemServer) >> gemServer:exceptionSet:beforeUnwind:ensure: @2 line 4  [GsNMethod 202392577]
18 ExecBlock0 (ExecBlock) >> ensure: @2 line 12  [GsNMethod 4227585]
19 ZnNewGemServer (GemServer) >> gemServer:exceptionSet:beforeUnwind:ensure: @2 line 22  [GsNMethod 189047041]
20 ZnNewGemServer (GemServer) >> gemServer:beforeUnwind:ensure: @3 line 3  [GsNMethod 189046273]
21 ZnNewGemServer (GemServer) >> gemServer: @2 line 3  [GsNMethod 189046017]
22 [] in  ExecBlock0 (ZnNewGemServer) >> basicServerOn: @2 line 11  [GsNMethod 258032641]
23 [] in  ExecBlock0 (ExecBlock) >> ifCurtailed: @2 line 6  [GsNMethod 56400897]
24 ExecBlock0 (ExecBlock) >> ensure: @2 line 12  [GsNMethod 4227585]
25 ExecBlock0 (ExecBlock) >> ifCurtailed: @3 line 8  [GsNMethod 52826369]
26 ZnNewGemServer >> basicServerOn: @6 line 15  [GsNMethod 241580801]
27 [] in  ExecBlock (ZnNewGemServer) >> startBasicServerOn: @2 line 13  [GsNMethod 258031617]
28 GsProcess >> _start @7 line 16  [GsNMethod 4420353]
29 <Reenter marker>
-----------
----------- FlowAppServer MessageNotUnderstood exception encountered: 2015-02-07T19:42:20.1294119358062-08:00
a MessageNotUnderstood occurred (error 2010), a UndefinedObject does not understand  #'isValid'
1 ZnNewGemServer (GemServer) >> writeGemLogEntryFor:titled: @3 line 3  [GsNMethod 189036801]
2 ZnNewGemServer (GemServer) >> logStack:titled:inTransactionDo: @2 line 2  [GsNMethod 189021953]
3 ZnNewGemServer (GemServer) >> logStack:titled: @2 line 2  [GsNMethod 189030401]
4 ZnNewGemServer (GemServer) >> gemServerHandleErrorException: @9 line 6  [GsNMethod 189024257]
5 MessageNotUnderstood (Error) >> exceptionHandlingForGemServer: @2 line 2  [GsNMethod 184258561]
6 ZnNewGemServer (GemServer) >> handleGemServerException: @2 line 5  [GsNMethod 189028353]
7 [] in  ExecBlock0 (GemServer) >> gemServer:exceptionSet:beforeUnwind:ensure: @3 line 9  [GsNMethod 210501377]
8 ExecBlock0 (ExecBlock) >> on:do: @3 line 42  [GsNMethod 3187969]
9 [] in  ExecBlock1 (GemServer) >> gemServer:exceptionSet:beforeUnwind:ensure: @2 line 11  [GsNMethod 208762369]
10 MessageNotUnderstood (AbstractException) >> _executeHandler: @3 line 8  [GsNMethod 3998209]
11 MessageNotUnderstood (AbstractException) >> _signalWith: @1 line 1  [GsNMethod 4011777]
12 MessageNotUnderstood (AbstractException) >> signal @2 line 47  [GsNMethod 4015873]
13 UndefinedObject (Object) >> doesNotUnderstand: @9 line 10  [GsNMethod 4084225]
14 UndefinedObject (Object) >> _doesNotUnderstand:args:envId:reason: @7 line 12  [GsNMethod 4078849]
15 [] in  ExecBlock0 (ZnNewGemServer) >> basicServerOn: @3 line 12  [GsNMethod 267351809]
16 ExecBlock0 (ExecBlock) >> on:do: @3 line 42  [GsNMethod 3187969]
17 [] in  ExecBlock0 (GemServer) >> gemServer:exceptionSet:beforeUnwind:ensure: @2 line 4  [GsNMethod 202392577]
18 ExecBlock0 (ExecBlock) >> ensure: @2 line 12  [GsNMethod 4227585]
19 ZnNewGemServer (GemServer) >> gemServer:exceptionSet:beforeUnwind:ensure: @2 line 22  [GsNMethod 189047041]
20 ZnNewGemServer (GemServer) >> gemServer:beforeUnwind:ensure: @3 line 3  [GsNMethod 189046273]
21 ZnNewGemServer (GemServer) >> gemServer: @2 line 3  [GsNMethod 189046017]
22 [] in  ExecBlock0 (ZnNewGemServer) >> basicServerOn: @2 line 11  [GsNMethod 258032641]
23 [] in  ExecBlock0 (ExecBlock) >> ifCurtailed: @2 line 6  [GsNMethod 56400897]
24 ExecBlock0 (ExecBlock) >> ensure: @2 line 12  [GsNMethod 4227585]
25 ExecBlock0 (ExecBlock) >> ifCurtailed: @3 line 8  [GsNMethod 52826369]
26 ZnNewGemServer >> basicServerOn: @6 line 15  [GsNMethod 241580801]
27 [] in  ExecBlock (ZnNewGemServer) >> startBasicServerOn: @2 line 13  [GsNMethod 258031617]
28 GsProcess >> _start @7 line 16  [GsNMethod 4420353]
29 <Reenter marker>
-----------

The problem appears to be in ZnNewGemServer>>basicServerOn:. The #gemServer: call in ZnNewGemServer>>basicServerOn: records the error to the log and returns, causing the system to spin the in the #repeat loop. Instead I think that the code should pass the exception after logging so that the gem will die instead of looping:

basicServerOn: port
  "forked by caller"

  | znServer |
  znServer := self serverInstance.
  znServer port: port.
  znServer initializeServerSocket.
  [
  [
  self
    gemServer: [
      znServer serverSocket isValid
        ifTrue: [ znServer serveConnectionsOn: znServer serverSocket ] ]
    beforeUnwind: [ :ex | ex pass ] ]
    repeat ]
    ifCurtailed: [ znServer releaseServerSocket ] 
dalehenrich commented 9 years ago

This patch is not quite correct ... apparently there are additional errors ... and on further review ... ex pass is not the right thing ... without a beforeUnwind: the exception is being passed, No stacks available yet for the additional errors...

dalehenrich commented 9 years ago

After reading the code some more I take back my original assertion that the ex pass doesn't do anything ... the ex pass should indeed cause the gem to exit ... I am however, not convinced that we get the whole story this way ... and I think that at the end of the day, a different fix is called for ... need to think a bit ...

dalehenrich commented 9 years ago

No ... now I do believe that the ex pass is the right solution ... the ex pass will be done after the original error is logged and the error will cause the gem to exit ... which is supposed to happen ... looking at the error message will tell you what you need to do to fix the problem ...

In fact what I expect to see as the additional error is actually the original error that caused the connection to fail in the first place .... so the root cause is the error that gets masked by the missing ex pass ... I think ...

So at this point, we need to apply the patch and then pay attention to the next error to figure out what is going on ...

dalehenrich commented 9 years ago

Here's a snippet of the error encountered after applying the suggested fix:

a MessageNotUnderstood occurred (error 2010), a UndefinedObject does not understand  #'add:'
1 GemServer >> writeGemLogEntryFor:titled: (envId 0) @3 line 3  [GsNMethod 176062209]
2 GemServer >> logStack:titled:inTransactionDo: (envId 0) @2 line 2  [GsNMethod 176066561]
3 GemServer >> logStack:titled: (envId 0) @2 line 2 [GsNMethod 176042241]
4 GemServer >> gemServerHandleErrorException: (envId 0) @9 line 6  [GsNMethod 176062465]
5 Error >> exceptionHandlingForGemServer: (envId 0) @2 line 2 [GsNMethod 176884993]
6 GemServer >> handleGemServerException: (envId 0) @2 line 5 [GsNMethod 176050945]
7 [] in  GemServer >> gemServer:exceptionSet:beforeUnwind:ensure: (envId 0) @3 line 9 [GsNMethod 176569601]
8 ExecBlock >> on:do: (envId 0) @3 line 42  [GsNMethod 3730689]
9 [] in  GemServer >> gemServer:exceptionSet:beforeUnwind:ensure: (envId 0) @2 line 11 [GsNMethod 176681985]
10 AbstractException >> _executeHandler: (envId 0) @3 line 8 [GsNMethod 4617217]
11 AbstractException >> _signalWith: (envId 0) @1 line 1 [GsNMethod 4594689]
12 AbstractException >> signal (envId 0) @2 line 47 [GsNMethod 4587009]
13 Object >> doesNotUnderstand: (envId 0) @9 line 10 [GsNMethod 3595265]
14 Object >> _doesNotUnderstand:args:envId:reason: (envId 0) @7 line 12  [GsNMethod 3584513]
15 [] in  ZnGemServerManagingMultiThreadedServer >> socketStreamOn: (envId 0) @3 line 5  [GsNMethod 185304321] 

and looking at ZnGemServerManagingMultiThreadedServer>>connections:

connections
  connections
    ifNil: [ connections := TransientStackValue value: OrderedCollection new ].
  ^ connections value

I see that the TransientStackValue class is intended to be used ONLY for objects that are alive on the stack (temps and args) ... TransientValue is the class to use for objects that are not guaranteed to be referenced from the stack for their entire lifetime ...

dalehenrich commented 9 years ago

Let's follow the logic of using TransientStackValue in the first place:

  1. the connections OrderedCollection is stashed in a TransientStackValue so that the collection of sockets will not be persisted in the case that a continuation causes the ZnGemServerManagingMultiThreadedServer instance to be persisted ...
  2. a TransientValue adds a reference to the object from session temps, but session temps is not weak ... so TransientStackValues provide weak references for objects referenced from the stack ... as soon as an object is no longer referenced from the stack (persistent or not) the objects referenced by TransientValues are open to gc
  3. objects referenced by the stack are not eligible for stubbing (a stubbed persistent object can be dropped from memory triggering collection of TransientStackValue references). _setNoStubbing is used for objects directly referenced from stack that may themselves be using TransientStackValues like instances of ZnGemServerManagingMultiThreadedServer ... and _setNoStubbing may be needed at the next level if TransientStackValues are being used further down the chain ...
dalehenrich commented 9 years ago

green ... it looks like the real error, beyond the potential error in basicTransactionReentry method are covered by the Zinc bug: https://github.com/GsDevKit/zinc/issues/75