Last.fm api calls hang forever after some time

Ardakilic commented 5 months ago

Hello,

I'm running multi-scrobbler in a dockerized environment. I set the source to last.fm and target to maloja.

When I boot up the app, it syncs the backlog successfully, but last.fm gets stuck at "polling" after while.

In this screenshot, I restarted the app while I was playing a song, and it was visible my last.fm account as "scrobbling".

After a while, it got like this:

Then it gets stuck at polling.

As you can see from the first screenshot, when I try to restart the source, it retries and says [Sources] [Lastfm - myLastFm] Could not stop polling! Or polling signal was lost :(

At this state, when I restart the container, the container understands the backlog, and syncs accordingly, then it gets stuck at polling again.

I'm accessing the instance as http://ip:port , if that would matter.

So I believe my issue should be about polling, and debug log could not help me.

Here's my debug log, if that would help: https://0x0.st/HDuS.log

What am I missing, could you help please?

Thanks in advance!

Edit: After a while, these log lines are populated, but nothing is caught or scrobbled, and I've been listening to music during this period constantly:

2024-01-31T00:53:53+03:00 verbose : [Heartbeat] [Clients] Checked Dead letter queue for 1 clients.
2024-01-31T00:53:53+03:00 verbose : [Heartbeat] [Clients] Starting check...
2024-01-31T00:53:53+03:00 verbose : [Heartbeat] [Sources] Checked 1 sources for restart signals.
2024-01-31T00:53:53+03:00 verbose : [Heartbeat] [Sources] Starting check...

FoxxMD commented 5 months ago

Thanks for the detailed logs. I'll look into this soon.

Ardakilic commented 5 months ago

Thank you, please let me know if I can be of any help.

If that would help at all, my base OS in Debian, and I'm having the same issue with the current develop Docker tag, which was updated just a couple of hours ago as well.

FoxxMD commented 5 months ago

Can you clarify the event timeline for me?

Start MS
Backlogs, scrobbles
MS shows player because last.fm has "now playing" active
After awhile you hit restart because the player goes stale

If this is correct, can you try waiting until the song you are listening to is actually finished and shows up in your last.fm history? MS should pick it up shortly after it appears in last.fm history, regardless of what the player on MS is showing.

Ardakilic commented 5 months ago

Hello, it's almost like that.

Let me try to describe the journey, without clicking anything like restart etc.:

Start MS
Backlogs, scrobbles
Start listening to an album, now it'll be constantly playing and scrobbling.
At this point, first track, which is 7.5 m is is playing

This is the current state:

Now, I am currently listening the second song, which also shown on last.fm , however multi-scrobbler did not catch it, first song is not even marked as listened:

I wait a bit more, and now the first song is orphaned.

To clarify, I'm already at 3rd song at last.fm right now:

From now on, it won't catch any updates and will stuck like this.

After a while, the now playing screen disappears, while I'm still listening to the 3rd song:

After 3 more minutes, 3 tracks got scrobbled on my last.fm, playing the 4th song, but it's like this:

Here's my log since yesterday, you can find the logs here: https://0x0.st/HDep.log

@FoxxMD I'm at develop tagged container by the way, which was updated yesterday. I hope this helps to pinpoint the issue.

Ardakilic commented 5 months ago

20 mins passed, and the second dead letter queue got triggered, now it'll be in this loop forever:

NOW I click Restart (the link below "Polling 🟢" ), and it's like this:

Ardakilic commented 5 months ago

Currently trying the same flow with the latest :develop build, which is built after your last 2 commits. I'll let you know if the outcome is any different.

FoxxMD commented 5 months ago

Interesting...

So from your logs:

2024-01-31T19:22:19+03:00 verbose : [Sources] [Lastfm - myLastFm] Sleeping for 15.00s
2024-01-31T19:22:33+03:00 debug   : [Sources] [Lastfm - myLastFm] Refreshing recently played
2024-01-31T19:23:12+03:00 debug   : [Sources] [Lastfm - myLastFm] [Player NoDevice-SingleUser] Stale after no Play updates for 52 seconds

We can see that it tries to get recently played from LFM (API request) at 19:22:33 but then there is no more logging until 19:23:12, almost 40 seconds later. There should have been another two "sleeping for 15s" logs in that time. That, along with the logs with waiting for polling stop signal leads me to believe that the API request to LFM is hanging forever.

The library I use for interacting with LFM api does not seem to implement any request timeouts so this could be the cause. I have patched the library to make it timeout after waiting 3 seconds for a response. Please update your docker container to use foxxmd/multi-scrobbler:experimental and let me know if you start seeing timeout errors in the logs.

Ardakilic commented 5 months ago

@FoxxMD Hello, I was just going to paste this, I was waiting for about an hour for it to be idle:

With the develop tagged image, I had to re-authenticate, then kept on listening.

4 songs got identified at now playing section successfully, times were correct, and got forwarded to Maloja without issues 🎉

Currently, I'm idling a bit to try to understand what the code will do when I start playing after a long break 🤞 I'll post a new post after waiting about an hour and directly start scrobbling again.

Either what you did solved it, or it may be due to timeouts as you mentioned. I'm updating the tag and checking again. I'll let you know.

Ardakilic commented 5 months ago

:experimental is having the same issues. Rolled back to :developand so far it's working as intended, just like my issue is fixed.

I'll idle a bit and try scrobbling again with develop build.

FoxxMD commented 5 months ago

Apologies...I missed a change in the experimental image when I last pushed it. Try pulling the latest experimental tag now.

Ardakilic commented 5 months ago

No worries. Pulling the latest experimental and trying again 🤞

Ardakilic commented 5 months ago

@FoxxMD Suddenly I started seeing a lot of timeout errors 😆

(and this is a high quality VPS which I'm serving my own mastodon instance, possibly peering to last.fm from my server is crap)

It also switched to the new song, and so far it's working as intended.

Also, as I mentioned, whatever you did in develop tag earlier today was also working flawlessly, and worked for both times I switched my tag to. I believe it consumes the history as true source instead?

I believe this is good news 🎉

ps: The only problem is that every time I switched my tag to another one, the script asked for re-authentication. Is this intended?

Edit:

Ahh, just after I posted this I got an error 😢 (I'll definitely increase my retry and timeout limits lol). However I believe since it's the status it would not cause missing any scrobbles, right?

FoxxMD commented 5 months ago

Glad to hear this is helping resolve the issue.

this is a high quality VPS

If you have the ability to run MS from source you could try that and see if there is a difference in frequency of timeouts. Other users in the past have had DNS issues that may or may not have been caused by the OS used in the docker image, alpine.

whatever you did in develop tag earlier...I believe it consumes the history as true source instead?

The changes in develop actually shouldn't have done anything to help with this issue...the history vs. player stuff is purely presentational -- MS was already only using LFM history and the player was superficial but now its indicated in the ui and logs instead of being behind the scenes.

every time I switched my tag to another one, the script asked for re-authentication.

This would occur when using docker if you are only using ENV based configuration and have not mounted a volume into the docker container to persist the created credentials.

Ahh, just after I posted this I got an error 😢 (I'll definitely increase my retry limit lol). However I believe since it's the status it would not cause missing any scrobbles, right?

Yes it should not miss any scrobbles since polling for the source was restarted. You can increase the number of request retries and/or poll retries using file-based configs with either config.json or lastfm.json via maxPollRetries and maxRequestRetries

Ardakilic commented 5 months ago

If you have the ability to run MS from source you could try that and see if there is a difference in frequency of timeouts. Other users in the past have had DNS issues that may or may not have been caused by the OS used in the docker image, alpine.

I'll try and let you know on my next comment, thanks. (by the way I always prefer Distroless Debian or Debian stable-slim over Alpine, it's a subjective idea but personally never liked it 😓 )

This would occur when using docker if you are only using ENV based configuration and have not mounted a volume into the docker container to persist the created credentials.

My configuration is sort of hybrid, and currentCreds-lastfm-myLastFm.json is persistent on my storage.

multi-scrobbler:
  image: foxxmd/multi-scrobbler:experimental
  container_name: multi_scrobbler
  environment:
    - TZ=Europe/Istanbul
    - PUID=1000
    - PGID=1000
    - LOG_LEVEL=error
  volumes:
    - /home/arda/docker/multiscrobbler:/config
  ports:
    - hostPort:9078
  restart: unless-stopped
  networks:
    - ardanet

Additionally, at the same config folder, I have persistent lastfm.json and maloja.json, which I can also access from my host Debian machine. docker compose down/up cycle does ask for the authentication.

At this state, would it cause any issues?

I'll post my findings after installing Node, trying locally and let you know.

FoxxMD commented 5 months ago

I'm considering changing the base distro but my choices are limited to what linuxserver.io offers because their base images handle file permissions via PUID/PGID, their s6-init setup is so useful for container startup scripting, and they keep things updated so extremely well since they have a large ecosystem. Unfortunately their debian/ubuntu images tack on at least 35MB to the image size but it might be worth it given how iffy alpine has been the more I've used it.

Your current compose setup should not be causing re-authentication. So your credentials are at /home/arda/docker/multiscrobbler/config/currentCreds-lastfm-myLastFm.json and that is the true name of the file? Can you post your lastfm.json contents (with apikey/secret redacted)?

Ardakilic commented 5 months ago

I've played for like 20 mins on develop branch from source, on node 18 (whatever nvm use provided me), and it's running perfectly. It checked for every 15 secs, and zero issues. I also didn't even need to authenticate, it used the one already in the json file.

(I was too lazy to configure maloja for this, since it's irrelevant for this test)

By the way, at my old job, I had issues connecting to random external services on various virtual machines, because my host and/or container suddenly decided to connect to ipv6 of such host, and it was unreachable from there, due to upstream 🤷‍♂️ As a final test, I'll try the current experimental docker tag with ipv6 disabled on my host machine's kernel. Maybe it'll make a difference.

Edit: Here's my lastfm.json, with credentials redacted:

[
  {
    "name": "myLastFm",
    "enable": true,
    "configureAs": "source",
    "data": {
      "apiKey": "",
      "secret": "",
      "redirectUri": "http://publicip:port/lastfm/callback"
    }
  }
]

Your current compose setup should not be causing re-authentication. So your credentials are at /home/arda/docker/multiscrobbler/config/currentCreds-lastfm-myLastFm.json and that is the true name of the file?

Actually it's not correct, it's stored at:

/home/arda/docker/multiscrobbler/currentCreds-lastfm-myLastFm.json

I've mapped the /config folder of the container to /home/arda/docker/multiscrobbler folder.

Ardakilic commented 5 months ago

@FoxxMD

On my host, MS running from the source, it works flawlessly (With ipv6 enabled).
Due to my hunch, with ipv6 disabled on kernel level on the host, MS running on the container, with experimental tag (where it gave me network errors back then), for like 20 mins, it worked flawlessly, zero errors encountered.

I'm okay with keeping ipv6 disabled on this host machine by the way, but maybe Alpine likes to prefer ipv6?

This may be out of scope, but can this be implemented in a way from env or config ?

https://nodejs.org/api/cli.html#--dns-result-orderorder

https://nodejs.org/api/dns.html#dnssetdefaultresultorderorder

I'm not sure, but I believe having this somewhere while bootstrapping would solve this case, assuming it's because of this:

const dns = require('dns');
dns.setDefaultResultOrder('ipv4first');

FoxxMD commented 5 months ago

Wild! All the other issues similar to this have danced around this but I've never been able to reproduce it. Thanks for all the testing.

dns.setDefaultResultOrder('ipv4first');

This may fix things but I'm pretty hesitant to dictate something as low level as dns resolution behavior at the application level just to fix an issue in alpine. It will affect behavior for anyone running local/flatpak and doesn't leave an escape hatch for users who may prefer ipv6.

I'm also not sure it would even work. musl ignores basic stuff like this in low-level calls, requiring ipv6 to be completely disabled in the container in order to get around it. There are countless other similar symptoms that have roots in other parts of musl 1 2 3 #88 to the point of being acknowledged as a known caveat.

I'm going to call this particular issue -- lastfm not having request timeouts -- as being solved and address the dns problem by changing distro bases eventually. If you don't mind, I'll ping you when I get around to building those images as you have a reproducible environment for this and can test reliably.

Ardakilic commented 5 months ago

At a second thought, I agree. On a product I would not implement this. Actually I thought about arguments being injected to npm script (something like node build/index.js $extraParams, where extra params are --dns-result-order=preferipv4) but nah, I agree, it'll bring unnecessary complexity.

I'm also not sure it would even work. musl ignores basic stuff like this in low-level calls, requiring ipv6 to be completely disabled in the container in order to get around it. There are countless other similar symptoms that have roots in other parts of musl 1 https://github.com/nodejs/docker-node/issues/602 https://github.com/gliderlabs/docker-alpine/issues/539 https://github.com/FoxxMD/multi-scrobbler/issues/88 to the point of being acknowledged as a known caveat.

If this is the case, indeed, it'll be futile.

I'm going to call this particular issue -- lastfm not having request timeouts -- as being solved and address the dns problem by changing distro bases eventually. If you don't mind, I'll ping you when I get around to building those images as you have a reproducible environment for this and can test reliably.

Yup, since I believe we found the caveat, I believe my original issue could ne marked as solved.

Also, sure! Feel free to ping me anytime, I'll be more than happy to test and help when needed.

Also thank you for staying with me and actually debugging the log files and keeping up with this 🙏

Until a new release comes, I'll stick with the experimental, with ipv6 disabled on host kernel level.

Please feel free to close the issue as your liking for your planning.

FoxxMD commented 4 months ago

Lastfm timeout fix released in 0.6.5

FoxxMD / multi-scrobbler

Last.fm api calls hang forever after some time #134