devgianlu / go-librespot

Yet another open source Spotify client, written in Go.
GNU General Public License v3.0
52 stars 7 forks source link

Player not resuming correctly after network error #38

Closed tylkie closed 1 month ago

tylkie commented 2 months ago

I provided a log of last night where the player did not recover correctly from a network error. I split the log with empty lines into three parts for better understanding. I did not remove any lines in between. I interpret those blocks as follows...

1.) As a result of a network error (bad gateway in this case), fetching data or creating streams obviously must fail. The player stops playback without reauthenticating and lingers around.

2.) Two hours later the player realizes the connection has been lost and reauthenticates, but it does not resume playback.

3.) The official spotify client indicates that the player is still playing. This makes sense as the "stop command" from the player did not reach the spotify api anymore. Pressing "Pause" in the official client has then no effect. Only skipping to the next track resolves all state desyncs.

Reproducing this should be easy by simulating a network error. I already observed this before the prefetching commits.

spotifyd.connection.handler.log

Update

Got the same behavior last night. I took a look at what happened on the network. Point 1 and 2 are independent events. At the first block the spotify api itself is blocking the request. Too many request maybe? I cannot really tell without the complete response header. This happens randomly every 12 to 48 hours... and there is no way to reproduce this event on demand and I cannot understand why I get the 503's with librespot, but not with the official client. Only plausible explanation would be that the official client caches data, thus producing less traffic and requests. Physically the connection is still active.

At the second block, network disconnection happens physically by my provider. The player then handles everything as it should... with the only exception being the absence of some sort of fallback sync with the api after reconnection. That would guarantee that playback is automatically resumed once the connection has been reestablished and requests are not being blocked anymore.

I understand that this is a very specific problem to my cause, where I need to rely on self-managed headless playback without any user interaction. Most users either won't have that problem... or they wouldn't mind resyncing manually. Before investigating this further I'd suggest to wait if someone else has the same symptoms.

tylkie commented 2 months ago

Yet another example without 503 this time. See the log attached. Player fails requesting data and stops. Core problem is the same... the player does not recover from a failed request and stops. Again, only selecting a new context or skipping to the next track manually helps.

Unfortunately, API errors are difficult to reproduce. In my humble opinion, mitigating all possible API errors is an unfeasible task. But recovering from them might be possible.

librespot-go-414.log

devgianlu commented 2 months ago

I have added a simple retry to the downloading of audio checks to workaround (failed initializing chunked reader: invalid first chunk response status: 502 Bad Gateway).

The latter issue is quite strange as the URL to fetch the next page of tracks is utterly long and the server complains about it so retrying woudn't have any effect. Perhaps that could be an upstream issue as I don't modify it before using.

In general I agree there's still room for improvement with regards to handling failures in the API (or everywhere really).

tylkie commented 2 months ago

Thank you! From my experience of reverse engineering I can understand that there is no way to achieve full compliance at any point in time. I'll just keep in mind not to use playlists with over 200 songs for now. Now that this on the table, I even think to remember having this problem with the official client eventually, too... with the client complaining not being able to play a certain song. Only thing that helped back then was selecting a new context.

I'll keep testing and posting :-)

tylkie commented 1 month ago

Unfortunately, I failed copying the binary from the latest commit to my release directory, for which I tested four days without the workaround you provided. I had no further real API errors during that time. But I found out that your workaround also fixed another issue with repeating contexts. So to be absolutely sure this works for API errors, I just routed dealer requests to a local testing endpoint for a few seconds, which would always reply with a 500 internal server error. Player handles it well. The workaround does what needs to be done. I am marking this closed.

tylkie commented 1 month ago

The latter issue is quite strange as the URL to fetch the next page of tracks is utterly long and the server complains about it so retrying wouldn't have any effect. Perhaps that could be an upstream issue as I don't modify it before using.

For completeness I bypassed the over 200 songs issue with one little change in tracks/tracks.go within the GoNext function. Just in case anyone has the same issue and needs to keep the player going at all cost. The error message will still be logged. That way I can track once upstream gets this fixed. For now, the player will restart at the first track.

if err := iter.error(); err != nil { log.WithError(err).Error("failed going to next track") return tl.GoStart() }

A more elegant solution for my flavor would be to create a new context with the previous one as seed. If I remember correctly, the java version had a config option where this would happen automatically once a playlist finished. I liked that one pretty much, too. Should be fairly easy as the API actually contains an endpoint for seeding new playlists. This would allow endless playback on the same type of music with enough randomness, mitigating the feeling of a monotonous listening experience.