dedis / prifi

PriFi, a low-latency, local-area anonymous communication network.
https://prifi.net
GNU Affero General Public License v3.0
48 stars 11 forks source link

PriFi non-restart bug located #192

Open lbarman opened 6 years ago

lbarman commented 6 years ago

After a client goes into Airplane mode, everything is stuck. Reason: Relay/trustees believe the protocol is still running

E : (relay.(*PriFiLibRelayInstance).checkIfRoundHasEndedAfterTimeOut_Phase1:  41) - MAX_NUMBER_OF_CONSECUTIVE_FAILED_ROUNDS ( 3 ) reached, killing protocol.
3 : (relay.(*PriFiLibRelayInstance).checkIfRoundHasEndedAfterTimeOut_Phase1:  44) - Stopping experiment, if any.
E : (                         services.(*ServiceState).NetworkErrorHappened: 109) - NetworkErrorHappened, dumping trace
goroutine 677 [running]:
runtime/debug.Stack(0x1, 0xc42020cc50, 0x5422c8)
    /usr/local/go/src/runtime/debug/stack.go:24 +0xa7
runtime/debug.PrintStack()
    /usr/local/go/src/runtime/debug/stack.go:16 +0x22
github.com/dedis/prifi/sda/services.(*ServiceState).NetworkErrorHappened(0xc4200a5540, 0x0)
    /home/prifi/go/src/github.com/dedis/prifi/sda/services/prifi.go:110 +0x74
github.com/dedis/prifi/sda/services.(*ServiceState).handleTimeout(0xc4200a5540, 0xc4202c98a0, 0x1, 0x1, 0xc3c628, 0x0, 0x0)
    /home/prifi/go/src/github.com/dedis/prifi/sda/services/prifi.go:101 +0x34
github.com/dedis/prifi/sda/services.(*ServiceState).(github.com/dedis/prifi/sda/services.handleTimeout)-fm(0xc4202c98a0, 0x1, 0x1, 0xc3c628, 0x0, 0x0)
    /home/prifi/go/src/github.com/dedis/prifi/sda/services/config.go:90 +0x66
github.com/dedis/prifi/sda/protocols.(*PriFiSDAProtocol).handleTimeout(0xc420276000, 0xc4204a60c0, 0x1, 0x1, 0xc3c628, 0x0, 0x0)
    /home/prifi/go/src/github.com/dedis/prifi/sda/protocols/protocol.go:158 +0x250
github.com/dedis/prifi/sda/protocols.(*PriFiSDAProtocol).(github.com/dedis/prifi/sda/protocols.handleTimeout)-fm(0xc4204a60c0, 0x1, 0x1, 0xc3c628, 0x0, 0x0)
    /home/prifi/go/src/github.com/dedis/prifi/sda/protocols/prifi.go:118 +0x66
github.com/dedis/prifi/prifi-lib/relay.(*PriFiLibRelayInstance).checkIfRoundHasEndedAfterTimeOut_Phase1(0xc42017c880, 0xc400000222)
    /home/prifi/go/src/github.com/dedis/prifi/prifi-lib/relay/timeouts.go:46 +0x589
created by github.com/dedis/prifi/prifi-lib/relay.(*PriFiLibRelayInstance).downstreamPhase1_openRoundAndSendData
    /home/prifi/go/src/github.com/dedis/prifi/prifi-lib/relay/relay.go:602 +0x760
E : (                         services.(*ServiceState).NetworkErrorHappened: 121) - A network error occurred with node <nil> , warning other clients.
1 : (                 services.(*ServiceState).StopPriFiCommunicateProtocol: 175) - Stopping PriFi protocol
I : (                            services.(*ServiceState).connectToTrustees: 199) - connectToTrustees still alive, Protocol running true
I : (                            services.(*ServiceState).connectToTrustees: 199) - connectToTrustees still alive, Protocol running true
lbarman commented 6 years ago

Update: it's a concurrency issue. Thread 1 is still processing a message, timeout happens on Thread 2, which tries to call "Stop()" on thread 1

1 @ 0x545ee8 0x545cf0 0x542834 0x819c3a 0x81cd5a 0x817f0d 0x81976f 0x819634 0x81caf6 0x8026a0 0x805e26 0x7f9c59 0x459871
#   0x545ee7    runtime/pprof.writeRuntimeProfile+0x97                                          /usr/local/go/src/runtime/pprof/pprof.go:679
#   0x545cef    runtime/pprof.writeGoroutine+0x9f                                           /usr/local/go/src/runtime/pprof/pprof.go:641
#   0x542833    runtime/pprof.(*Profile).WriteTo+0x3e3                                          /usr/local/go/src/runtime/pprof/pprof.go:310
#   0x819c39    github.com/dedis/prifi/sda/services.(*ServiceState).StopPriFiCommunicateProtocol+0xb9                   /home/prifi/go/src/github.com/dedis/prifi/sda/services/prifi.go:181
#   0x81cd59    github.com/dedis/prifi/sda/services.(*ServiceState).StopPriFiCommunicateProtocol-fm+0x29                /home/prifi/go/src/github.com/dedis/prifi/sda/services/service.go:149
#   0x817f0c    github.com/dedis/prifi/sda/services.(*churnHandler).handleUnknownDisconnection+0xbc                 /home/prifi/go/src/github.com/dedis/prifi/sda/services/churn.go:257
#   0x81976e    github.com/dedis/prifi/sda/services.(*ServiceState).NetworkErrorHappened+0x11e                      /home/prifi/go/src/github.com/dedis/prifi/sda/services/prifi.go:127
#   0x819633    github.com/dedis/prifi/sda/services.(*ServiceState).handleTimeout+0x33                          /home/prifi/go/src/github.com/dedis/prifi/sda/services/prifi.go:106
#   0x81caf5    github.com/dedis/prifi/sda/services.(*ServiceState).(github.com/dedis/prifi/sda/services.handleTimeout)-fm+0x65     /home/prifi/go/src/github.com/dedis/prifi/sda/services/config.go:90
#   0x80269f    github.com/dedis/prifi/sda/protocols.(*PriFiSDAProtocol).handleTimeout+0x24f                        /home/prifi/go/src/github.com/dedis/prifi/sda/protocols/protocol.go:158
#   0x805e25    github.com/dedis/prifi/sda/protocols.(*PriFiSDAProtocol).(github.com/dedis/prifi/sda/protocols.handleTimeout)-fm+0x65   /home/prifi/go/src/github.com/dedis/prifi/sda/protocols/prifi.go:118
#   0x7f9c58    github.com/dedis/prifi/prifi-lib/relay.(*PriFiLibRelayInstance).checkIfRoundHasEndedAfterTimeOut_Phase1+0x588       /home/prifi/go/src/github.com/dedis/prifi/prifi-lib/relay/timeouts.go:46

I : (                       services.(*ServiceState).IsPriFiProtocolRunning:  43) - IsPriFiProtocolRunning not nil, has stopped false
1 : (                 services.(*ServiceState).StopPriFiCommunicateProtocol: 187) - PriFi Running, continuing stop attempt...
1 : (                 services.(*ServiceState).StopPriFiCommunicateProtocol: 191) - PriFi not nil, calling stop...

but gets put on hold in

p.relayState.processingLock.Lock()
defer p.relayState.processingLock.Unlock()

We need to find another way to stop the protocol without having it handle a message

lbarman commented 6 years ago

Please test 822243822d9f66c0bfb23e815ffd1b4a992951e3 and let me know what is the status ! @italod

lbarman commented 6 years ago

Bug fixed. I had to rewrite the history (my mistake - this should not happen). If git pull fails and tells you to merge, don't do it. Instead, do:

git fetch --all git reset --hard origin/prifi-android

(careful: this does not touches your untracked file, but resets all modified files)

junchen108 commented 6 years ago

I am working on the client side. The client doesn't reconnect after recovering the connectivity.

junchen108 commented 6 years ago

Please test f03dde8, it should fix the bug on the client side.