cgravolet / scroblr

A lightweight browser extension that scrobbles the music you listen to on the web.
http://scroblr.fm
Other
230 stars 62 forks source link

Wrong track info scrobbled from Soundcloud sets #40

Closed Chudesnov closed 6 years ago

Chudesnov commented 10 years ago

Wrong track info is scrobbled when a soundcloud set (e.g. https://soundcloud.com/royksopp/sets/junior) is played — the set title is used as a track title.

sonakpatel commented 10 years ago

Forked commit is a partial fix due to only working when listening to the set directly. However, listening from http://soundcloud.com/stream causes sets to:

This results in the wrong track info sent as per the original bug report. Potentially some more specific scraping when the page is in Stream mode to pick up the correct info?

cgravolet commented 10 years ago

I didn't write the original Soundcloud plugin, so I'm not sure on this, but it could be helpful to look into seeing if the Soundcloud player has some sort of API that could be leveraged to determine the current playing track. There is some info about an API and embedded player on https://developers.soundcloud.com

Ideally, it would work similar to how the Plug.dj plugin works. Not sure if this is an option or not, but worth looking into.

sonakpatel commented 10 years ago

Certainly the cleaner approach. I had a brief look at the Soundcloud API's, the embedded API seems to be related to players that are embedded iframes (externally). The HTTP API can request the track info but requires you to know the track ID.

This ID info isn't exposed client side on http://soundcloud.com similar to http://plug.dj, looking into if it's possible to extract the track ID from somewhere when listening in stream mode.

grantbacon commented 9 years ago

When a new song plays a POST request is made to:

https://api.soundcloud.com/tracks/137154665/plays?policy=ALLOW&client_id=xxx&app_version=2579ef5

This URI contains the track ID and a client ID. This is enough to get all the track data from the API using a GET request.

I'm new to chrome extensions so I'm not sure if these network requests are something that can be "seen" by the extension, but I'm reading into it. Any tips appreciated :v:

grantbacon commented 9 years ago

I've done a bit more research. chrome.webRequest can be used to sniff web traffic on a page. This requires an additional permission for the plugin.

I'm trying to get a working version going, but it doesn't seem like it will be difficult.

Are there any reservations about adding a "webrequest" permission to the extension?

cgravolet commented 9 years ago

If it means being able to have more stable/reliable soundcloud support, I wouldn't be against it. Nice find.

grantbacon commented 9 years ago

I have a proof of concept here: https://github.com/grantbacon/scroblr/commit/c07baf39e96b79b2dd1f31e6dc7649e5693d454a

The chrome.webRequest API is only accessible from background.html, so some message passing will have to be done. Unfortunately, functions (in this case a web request listener) cannot be passed through messages in Safari as mentioned in the comments for sendMessage.

The PoC outputs the track info to background.html's console.log for now. This implementation does scrape info for a track from any of Soundcloud's views (Track View, Stream List, Album View, Artist View), and happens as soon as a new track begins playing.

I'm unsure how to proceed with organizing this so that listeners can be defined in a plugin file and not pollute main-background.js

There are two things a potential plugin builder must create to use this method for getting track data:

  1. A filter object
    filter = {
            urls: ["*://api*.soundcloud.com/*"],
            types:["xmlhttprequest"]
        };
  1. A listener function
            var soundcloudListener = function(req) {
                if (req.method != "POST") {
                    return;
                }

                var trackIDRe = /\/tracks\/(\d+)\//i;
                var clientIDRe = /client_id=([a-z0-9]+)/i;
                var trackID = req.url.match(trackIDRe)[0].split('/')[2];
                var clientID = req.url.match(clientIDRe)[1];
                var songInfo;

                var songData = $.get("http://api.soundcloud.com/tracks/" + trackID + "?client_id=" + clientID,
                                function(response) {
                                    console.log("Artist: " + response.user.username);
                                    console.log("Title: " + response.title);
                                });

            };

Ideally, these two things could be defined in a Plugin. I will keep at thinking about how to sort this nicely.

cgravolet commented 9 years ago

What if the background script had a way to load plugins as well, so the Soundcloud plugin could have a front-end and back-end version of the plugin. The plugin loaded in content-script could send simple messages like "plugin loaded/unloaded" and then the background script can load/unload the backend Soundcloud plugin to start/stop the URL sniffing. I'd just like to keep the logic of determining what is/isn't playing out of the main background script, it's purpose is to only receive messages when a song needs to be updated. That would alleviate the need to send functions in messages possibly?