devgianlu / go-librespot

Yet another open source Spotify client, written in Go.
GNU General Public License v3.0
70 stars 14 forks source link

Shannon connection is not re-established on failure #88

Open aykevl opened 1 week ago

aykevl commented 1 week ago

Sometimes the Shannon connection (whatever that is) breaks, and go-librespot becomes unusable:

ERRO[0961] did not receive last pong ack from accesspoint, 120s passed                                                                       
ERRO[0961] failed receiving packet                       error="failed reading packet header: read tcp 10.65.42.112:51674->34.158.1.133:4070: use of closed network connection"                                                                                                           
ERRO[2461] did not receive last pong from dealer, 60s passed                                                                                                                                                                                                                              
ERRO[2491] did not receive last pong from dealer, 90s passed                                                                                                                                                                                                                              
ERRO[2521] did not receive last pong from dealer, 120s passed                                                                                
ERRO[2551] did not receive last pong from dealer, 150s passed                                                                                
ERRO[2581] did not receive last pong from dealer, 180s passed                                                                                
ERRO[2611] did not receive last pong from dealer, 210s passed                                                                                
ERRO[2641] did not receive last pong from dealer, 240s passed                                                                                
ERRO[2671] did not receive last pong from dealer, 270s passed                                                                                
ERRO[2701] did not receive last pong from dealer, 300s passed                                                                                
[...etc]

Here is another error:

ERRO[12016] failed receiving packet                       error="failed reading packet header: read tcp 10.65.42.112:55682->34.158.1.133:4070: read: connection reset by peer"

(this time without the "did not receive last pong from dealer" messages, though it did start to spew those messages when I tried, and failed, to connect to the go-librespot instance).

I inserted some debugging code, and the stack trace where this error is generated in both cases looks like this:

goroutine 87 [running]:
runtime/debug.Stack()
        /usr/local/go/src/runtime/debug/stack.go:26 +0x64
runtime/debug.PrintStack()
        /usr/local/go/src/runtime/debug/stack.go:18 +0x1c
github.com/devgianlu/go-librespot/ap.(*shannonConn).receivePacket(0x400040c0c0)
        /home/ayke/src/go-librespot/ap/shannon.go:83 +0x1d8
github.com/devgianlu/go-librespot/ap.(*Accesspoint).recvLoop(0x40004e8410)
        /home/ayke/src/go-librespot/ap/ap.go:251 +0x50
created by github.com/devgianlu/go-librespot/ap.(*Accesspoint).startReceiving.func1 in goroutine 1
        /home/ayke/src/go-librespot/ap/ap.go:235 +0x8c

Looking at the code (AccessPoint.recvLoop) it looks like it is intended to reestablish the connection when an error occurs, but clearly it doesn't for me.

devgianlu commented 1 week ago

I have observed this myself too, it was working fine before, I think it's a regression introduced by https://github.com/devgianlu/go-librespot/pull/69. I did not debug this yet because I am not able to reproduce it consistenly, just happens randomly on my device.

aykevl commented 1 week ago

Can you describe how this is supposed to work? Then I might be able to develop a fix (I normally have a Spotifyd/librespot instance running 24/7 so I hit this frequently).

devgianlu commented 1 week ago

There are two components called accesspoint (AP) and dealer that are persistent connections (TCP and Websocket). They both rely on a ping/pong (120 seconds for AP, 30 for dealer) mechanism to keep the server/client connected.

The idea behind that code is that if no ping/pong is received then the connection is closed causing the recv loop to fail and start reconnecting.

I think this is somewhat related to #48.