devgianlu / go-librespot

Yet another open source Spotify client, written in Go.
GNU General Public License v3.0
136 stars 19 forks source link

Crash when network is disconnected #134

Open aykevl opened 6 days ago

aykevl commented 6 days ago

I got this crash, I think when the network connection got disconnected:

ERRO[755131] did not receive last pong from dealer, 60s passed 
ERRO[755136] failed receiving dealer message               error="failed to get reader: WebSocket closed: sent close frame: status = StatusServiceRestart and reason = \"\""
ERRO[755161] did not receive last pong from dealer, 90s passed 
ERRO[755161] did not receive last pong ack from accesspoint, 173s passed 
ERRO[755161] failed receiving packet                       error="failed reading packet header: read tcp 10.65.42.112:51278->34.158.1.133:443: use of closed network connection"
WARN[755181] failed to connect to AP ap-gew4.spotify.com:80, retrying with a different AP  error="dial tcp: lookup ap-gew4.spotify.com on 10.65.42.1:53: read udp 10.65.42.112:60895->10.65.42.1:53: i/o timeout"
ERRO[755191] did not receive last pong from dealer, 120s passed 
WARN[755201] failed to connect to AP ap-guc3.spotify.com:4070, retrying with a different AP  error="dial tcp: lookup ap-guc3.spotify.com on 10.65.42.1:53: read udp 10.65.42.112:54788->10.65.42.1:53: i/o timeout"
ERRO[755221] did not receive last pong from dealer, 150s passed 
WARN[755221] failed to connect to AP ap-gew1.spotify.com:443, retrying with a different AP  error="dial tcp: lookup ap-gew1.spotify.com on 10.65.42.1:53: read udp 10.65.42.112:56707->10.65.42.1:53: i/o timeout"
WARN[755241] failed to connect to AP ap-gae2.spotify.com:80, retrying with a different AP  error="dial tcp: lookup ap-gae2.spotify.com on 10.65.42.1:53: read udp 10.65.42.112:46551->10.65.42.1:53: i/o timeout"
ERRO[755251] did not receive last pong from dealer, 180s passed 
WARN[755256] failed fetching new endpoint for accesspoint  error="failed fetching apresolve URL: Get \"https://apresolve.spotify.com/?type=accesspoint\": dial tcp: lookup apresolve.spotify.com on 10.65.42.1:53: read udp 10.65.42.112:46939->10.65.42.1:53: i/o timeout"
WARN[755271] failed to connect to AP ap-gew4.spotify.com:4070, retrying with a different AP  error="dial tcp: lookup ap-gew4.spotify.com: i/o timeout"
ERRO[755281] did not receive last pong ack from accesspoint, 293s passed 
ERRO[755281] did not receive last pong from dealer, 210s passed 
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xc87ed8]

goroutine 15 [running]:
github.com/devgianlu/go-librespot/ap.(*Accesspoint).pongAckTicker(0x4000471880)
        /home/ayke/src/spotify/go-librespot/ap/ap.go:347 +0x1e8
created by github.com/devgianlu/go-librespot/ap.(*Accesspoint).startReceiving.func1 in goroutine 1
        /home/ayke/src/spotify/go-librespot/ap/ap.go:250 +0x10c

And this crash, probably the same bug, when the network definitely got disconnected (I think there was some maintenance on the router).

ERRO[937664] failed receiving packet                       error="failed reading packet header: read tcp 10.65.42.112:34414->104.199.65.9:443: read: connection reset by peer"                                                                                                            
INFO[937665] authenticated AP as 0bzabjufeuoh2wmg0n912ftgw                                                                                                                                                                                                                                
ERRO[938626] failed receiving packet                       error="failed reading packet header: read tcp 10.65.42.112:43240->104.199.241.202:80: read: connection reset by peer"                                                                                                          
INFO[938626] authenticated AP as 0bzabjufeuoh2wmg0n912ftgw                                                                                                                                                                                                                                
ERRO[952286] failed receiving packet                       error="failed reading packet header: EOF"                                                                                                                                                                                      
INFO[952287] authenticated AP as 0bzabjufeuoh2wmg0n912ftgw                                                                                                                                                                                                                                
ERRO[958771] did not receive last pong from dealer, 60s passed                                                                                                                                                                                                                            
ERRO[958776] failed receiving dealer message               error="failed to get reader: WebSocket closed: sent close frame: status = StatusServiceRestart and reason = \"\""                                                                                                              
ERRO[958801] did not receive last pong ack from accesspoint, 154s passed                                                                                                                                                                                                                  
ERRO[958801] did not receive last pong from dealer, 90s passed                                                                                                                                                                                                                            
ERRO[958801] failed receiving packet                       error="failed reading packet header: read tcp 10.65.42.112:39610->34.158.1.133:443: use of closed network connection"                                                                                                          
WARN[958821] failed to connect to AP ap-gew4.spotify.com:80, retrying with a different AP  error="dial tcp: lookup ap-gew4.spotify.com on 10.65.42.1:53: read udp 10.65.42.112:53441->10.65.42.1:53: i/o timeout"                                                                         
ERRO[958831] did not receive last pong from dealer, 120s passed                                                                                                                                                                                                                           
WARN[958841] failed to connect to AP ap-guc3.spotify.com:4070, retrying with a different AP  error="dial tcp: lookup ap-guc3.spotify.com on 10.65.42.1:53: read udp 10.65.42.112:50588->10.65.42.1:53: i/o timeout"                                                                       
ERRO[958861] did not receive last pong from dealer, 150s passed                                                                                                                                                                                                                           
WARN[958861] failed to connect to AP ap-gue1.spotify.com:443, retrying with a different AP  error="dial tcp: lookup ap-gue1.spotify.com on 10.65.42.1:53: read udp 10.65.42.112:47574->10.65.42.1:53: i/o timeout"                                                                        
WARN[958881] failed to connect to AP ap-gae2.spotify.com:80, retrying with a different AP  error="dial tcp: lookup ap-gae2.spotify.com on 10.65.42.1:53: read udp 10.65.42.112:40195->10.65.42.1:53: i/o timeout"                                                                         
ERRO[958891] did not receive last pong from dealer, 180s passed                                                                                                                                                                                                                           
WARN[958901] failed fetching new endpoint for accesspoint  error="failed fetching apresolve URL: Get \"https://apresolve.spotify.com/?type=accesspoint\": dial tcp: lookup apresolve.spotify.com on 10.65.42.1:53: read udp 10.65.42.112:40473->10.65.42.1:53: i/o timeout"               
WARN[958911] failed to connect to AP ap-gew4.spotify.com:4070, retrying with a different AP  error="dial tcp: lookup ap-gew4.spotify.com: i/o timeout"                                                                                                                                    
ERRO[958921] did not receive last pong ack from accesspoint, 274s passed                                                                                                                                                                                                                  
ERRO[958921] did not receive last pong from dealer, 210s passed                                                                                                                                                                                                                           
panic: runtime error: invalid memory address or nil pointer dereference                                                                                                                                                                                                                   
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xc87ed8]

goroutine 78 [running]:
github.com/devgianlu/go-librespot/ap.(*Accesspoint).pongAckTicker(0x4000454000)
        /home/ayke/src/spotify/go-librespot/ap/ap.go:347 +0x1e8
created by github.com/devgianlu/go-librespot/ap.(*Accesspoint).startReceiving.func1 in goroutine 1
        /home/ayke/src/spotify/go-librespot/ap/ap.go:250 +0x10c

I suspect the issue is here:

https://github.com/devgianlu/go-librespot/blob/f2de517abcf49bb58eb0ccfed6319bd58df94d81/ap/ap.go#L82

This is ap.init, which is called from ap.connect, which is called from ap.reconnect. Looks like it sets ap.conn to nil when it can't connect resulting in the nil pointer dereference.

(Also, I think ap.connMu should have been used inside pongAckTimer?)


I should also say it's quite impressive that the daemon had been running crash-free for over 10 days!

devgianlu commented 4 days ago

Also, I think ap.connMu should have been used inside pongAckTimer?

I have mixed feelings about this: yes it should be held for reading, but at the same time if we end up waiting because it is held for writing and then call close on it we are probably closing a connection that has just been created. Citing my own comment:

https://github.com/devgianlu/go-librespot/blob/095020537ece27f8967a14819edfa4ed10350145/ap/ap.go#L53-L56


Main problem should be fixed with 095020537ece27f8967a14819edfa4ed10350145, same issue as 0b32df773c9cbe12a2bc44e92580d0c657a170a2.

aykevl commented 3 days ago

I have mixed feelings about this: yes it should be held for reading, but at the same time if we end up waiting because it is held for writing and then call close on it we are probably closing a connection that has just been created.

It's still a data race, which is basically undefined behavior (though it might work well enough in practice since it's a pointer read, which will normally not be torn).

Maybe I'll take a stab at fixing this at some point.