AzuraCast / AzuraCast

A self-hosted web radio management suite, including turnkey installer tools for the full radio software stack and a modern, easy-to-use web app to manage your stations.
https://www.azuracast.com/
GNU Affero General Public License v3.0
3.05k stars 567 forks source link

LS API calls at beginning of minute time out, cause scheduled tracks to be dropped among other issues. #7431

Open caturria opened 1 month ago

caturria commented 1 month ago

Installation Method

Docker Installation

AzuraCast Release Channel

Rolling Release Channel

Current AzuraCast Version

a19103a

What happened?

Hi,

As usual, thank you for this fantastic software.

If Liquidsoap's "feedback" or "nextsong" API calls happen to occur within the first or last few seconds of a minute, they often exceed their two second time limit because they are competing with several once-per-minute sync tasks for database locks. When this happens to nextsong, it will cause a track in the upcoming song queue to be lost. When it happens to feedback, it will cause a now playing update to be missed and song playback timeline to be inaccurate.

As an interim measure, would you be okay with increasing the azuracast_api_call time limit?

Thank you.

Relevant log output

2024/09/09 06:21:02 [lang:3] API feedback - Response (522):                                                                                            

2024/09/09 14:31:03 [lang:3] API feedback - Response (522):                                                                                            

2024/09/09 14:33:03 [lang:3] API nextsong - Response (522):                                                                                             
2024/09/08 04:50:04 [lang:3] API nextsong - Response (522):                                                                                            

2024/09/08 04:50:06 [lang:3] API nextsong - Response (522):                                                                                            

2024/09/08 04:50:07 [lang:3] API feedback - Response (522):                                                                                            

2024/09/08 06:31:03 [lang:3] API feedback - Response (522):                                                                                            

2024/09/08 10:55:02 [lang:3] API feedback - Response (522):                                                                                             2024/09/07 18:07:05 [lang:3] API feedback - Response (522):                                                                                            

2024/09/07 21:45:02 [lang:3] API feedback - Response (522):                                                                                            

2024/09/08 00:17:02 [lang:3] API nextsong - Response (522):                                                                                            

2024/09/08 00:17:55 [lang:3] API nextsong - Response (522):                                                                                            

2024/09/08 00:17:58 [lang:3] API nextsong - Response (522):                                                                                            

2024/09/08 00:18:01 [lang:3] API nextsong - Response (522):                                                                                            

2024/09/08 00:18:03 [lang:3] API nextsong - Response (522):                                                                                            

2024/09/08 00:18:05 [lang:3] API nextsong - Response (522):                                                                                            

2024/09/08 00:18:07 [lang:3] API nextsong - Response (522):                                                                                            

2024/09/08 00:18:09 [lang:3] API nextsong - Response (522):                                                                                            

2024/09/08 00:18:11 [lang:3] API nextsong - Response (522):                                                                                            

2024/09/08 00:18:13 [lang:3] API nextsong - Response (522):                                                                                            

2024/09/08 00:18:15 [lang:3] API nextsong - Response (522):                                                                                            

2024/09/08 00:18:19 [lang:3] API nextsong - Response (522):                                                                                            

2024/09/08 00:25:03 [lang:3] API nextsong - Response (522):
caturria commented 1 month ago

I'm giving it a spin with timeout set to 30.0 sec. Will report back in a day.

BusterNeece commented 1 month ago

@caturria 30 seconds is really high...what is it currently set to? I'd prefer something shorter honestly, because that timeout affects how often it retries during the initial startup, and thus how quickly it gets up and running initially.

caturria commented 1 month ago

@BusterNeece it's set to two seconds:

def azuracast_api_call(~timeout=2.0, url, payload) =

I'm not sure I fully understand you w/ respect to station startup. If a call with a 2.0 timeout has to try ten times VS. a call with a 30.0 timeout having to try once (and wait twenty seconds), I don't see what the difference is?

BusterNeece commented 1 month ago

@caturria It usually doesn't have to try ten times. It usually will get a successful response way before 30 seconds, maybe on the third or fourth attempt. If you're waiting 30 seconds, then that first request just sits and spins the whole time if the first request is gonna time out, forcing everyone to wait that whole time.

Long story short, please don't go from 2 seconds to 30. I'd be ok with 5 or 10 but not higher.

caturria commented 1 month ago

I'll try with 10 and see if I can go 24 hours without a (522).

I wouldn't have time to dive too deep into this for the next few weeks, but I think a more robust long term solution would be to have the app report the ID of the queue row it's returning. Then Liquidsoap could send the last ID it received along with the nextsong request, so that AutoDJ could re-send a track if LS reports not having received it. That and having feedback try multiple times to submit. Thoughts?

If ten seconds proves insufficient, how about having the timeout set to two initially, but then increasing it after the first successful response?

It's also possible that I'm more likely to experience this given I've been on a dev instance for so long (on a modest cloud VM for dev purposes). Even on my real server (far better than this dev instance) I recall seeing the odd missed now playing update and song playback timeline anomaly, so I know it can happen, just maybe not quite as frequently as it does here.

BusterNeece commented 1 month ago

@caturria Those are fine solutions, but as you pointed out they represent a fair amount more work. Wouldn't mind digging deeper into this issue, though. This also affects people on low-cost VPSes, so it's definitely not just your dev setup causing trouble.

caturria commented 1 month ago

@BusterNeece Yeah.

I can't say I know much about Slim (most of the PHP I've done in the past few years has been on Laravel), but do you know if it's perhaps possible to narrow the scope of the locks acquired by those maintenance tasks? Perhaps they could acquire only table-level locks for the tables they need to work on?

I'm also seeing the odd very slow page load, which I'm almost certain is related. I'll try a few times to load upcoming song queue right on the minute to see if I can reliably produce a sluggish result.

BusterNeece commented 1 month ago

@caturria The library in use there is Symfony's lock library. The locks are managed by Redis (unless it's missing, then filesystem-level) and are per-task, not per-table or per-DB.

caturria commented 1 month ago

@BusterNeece interesting. The Liquidsoap calls are definitely being blocked by something for a few seconds.

caturria commented 1 month ago

With the ten second timeout, I have successfully run for longer than 24 hours without an error. I've submitted a patch (one-line adjustment to LS config writer). Should at least provide mitigation until there's a fix for the contention.

caturria commented 1 month ago

Hi @BusterNeece,

I am requesting to reopen this as there is definitely more to it.

I installed it back on my main box the other day and started seeing 522s like crazy, even with the ten second timeout.

Sometimes they come in batches of five or more, meaning the endpoints are unresponsive for a minute or longer (or Liquidsoap thinks they are).

2024/09/13 03:16:11 [lang:3] API feedback - Response (522):                                                             

2024/09/13 03:16:48 [lang:3] API nextsong - Response (522):                                                             

2024/09/13 03:16:48 [lang:3] API feedback - Response (522):                                                             

2024/09/13 03:17:06 [lang:3] API feedback - Response (522):                                                             

Currently I have annotateNextSong marked up with logs to see if I can see it get hung up the next time it happens. But it does seem like it's getting locked out by a sync task or something every couple of hours.