Open gordielachance opened 7 years ago
Hi there. That blog post is a little out of date, but I scrape the stations every 30 seconds now. The only rate limiting I do is on the last.fm side of things, where I will only post once every time the song changes instead of once every 30 seconds. For all the stations though I haven't ran into any issues with scraping every 30 seconds, I don't get close to the data limit on my host, and on the station's side it's usually a less than what their own website will do for every user who is on the website. I also use JSON wherever possible so I can avoid pulling down a whole HTML file.
I have had a few stations block my server but that's only been 2 or 3 out of the 100+ stations I've got on the site. I think in general the stations would be happy with the little bit of extra exposure they get since the incoming traffic to them is so low.
I was starting to run into issues when I was doing it once every 15 seconds but that was only because I was trying to scrape a whole lot of stations, and then post data to last.fm for every user who was tuned in, and that was taking close to 15 seconds to complete. Changing to 30 seconds and reducing the number of calls to last.fm helped me there and I haven't had any issues since.
If you just want track history you could probably get by with just doing it once a minute. The only reason I do it every 30 seconds is because like my track changes to show up on last.fm as soon as possible.
Hi esemarte ! I developed a Wordpress plugin that you might like - actually, it does almost the same stuff that scrobblealong does : it scrapes data from radio stations to get the history of the last played songs for each. The tracklist can then be displayed on my website - and user can listen those songs through Youtube or someting. See it in action here : http://www.spiff-radio.org/tag/editors-pick/?post_type=wpsstm_live_playlist I see on your blog that "Scrobbling for the stations is handled by a task that runs every 15 seconds". Here's how mine works : it scrapes a station page to fetch an array of available (displayed) tracks, then refreshes only when someone requests the tracklist on my website. It's not bad but I'm not able to record the tracks history : for this the only way would be to do like you; a regular query of the last track every X seconds. I was wondering if you could tell me some more technical stuff about that. How does your server handles this (1 request every 15s * X stations) ? I'm afraid that if I do something similar, I could either have problems with my host or with the remote website (be banned so scraping is no more possible; especially if I use APIs); but I would like to find a way to store the tracks history so.... So I was wondering if you could give me some feedback about all that. Thanks !