adblockradio / webradio-metadata

Collection of scraping recipes to get metadata about what is being streamed on webradios
Mozilla Public License 2.0
35 stars 13 forks source link

Non standard ports #1

Closed stsier closed 5 years ago

stsier commented 5 years ago

Did you test your module with non standard ports over https? Browsers do not allow access to many shoutcast and icecast radios that broadcast on ports different from 80 and 443. You can listen to them (and you will see warnings in the console) but you can't do other requests (errors in the console).

dest4 commented 5 years ago

No, I did not. Note that this module fetches metadata from websites (JSON / XML / HTTP scraping), not from audio broadcasts (ICY metadata).

stsier commented 5 years ago

ok so the radio has to have a website separate from the stream?

dest4 commented 5 years ago

Yes it has. In my experience, most if not all radios have a website, while ICY metadata is most often broken.

stsier commented 5 years ago

well I can't agree with you but yeah most if not all radios do have a website when you speak of official channels with more or less big audience. In my exprerience I had to organize jazz music radios and most of them do not have a website and they broadcast at non standard ports. So I have to scrap the song info from another website which is not a clean solution... I guess the difference is that my site is https which doesn't allow curl connections to non https sites (but it allows /audio/ with warnings), while other radio sites are either http which allows curl, or they are on https with some exceptions allowed.

dest4 commented 5 years ago

Could you please give the names of those jazz webradios that do not have a website, but valid ICY metadata?

You may submit a PR that adds support for stream scraping if you think it is useful.

stsier commented 5 years ago

ok i double checked and actually most of them do have a website then I looked through your code, so you write a fetch script per radio channel to scrap the metadata? I have more than 60 only 30s and 40s- jazz radios.. hmm writting a script per channel is a lot of work.. for now i just scrap another big site with thousands of radios so I have the same script for any radio channel. I don't know how this website gets the metadata.

dest4 commented 5 years ago

1) You can use the same scraping script for many radios. Just require the same scraper in .../.../index.js.

2) The radio aggregators have deals with radios and exchange metadata over a private API. I'd like this project to only depend on info from first parties, i.e. radio websites.

stsier commented 5 years ago
  1. There's "parsers" directory in your repo with a script per radio. If I need a radio that is not in your list, I will have to write my own parser, right?
  2. thanks for explanation
dest4 commented 5 years ago

I you find that the radios you want to add are compatible with an existing parser, just require it by adding a line in .../index.js. If the radios are not compatible, copy paste a current parser and adapt it. There already are parsers for JSON, XML, HTTP scraping… Don't reinvent the wheel. Then add the line in .../index.js

stsier commented 5 years ago

you have a list of radios which must be popular but unfortunately there's none that I need Take for example https://cladriteradio.com/ would it take a long time to adapt it with your existing parsers? There's some similar metadata pattern?

dest4 commented 5 years ago

A quick analysis shows that the artist and title are in the URL

$ curl http://bluford.torontocast.com:8454/7.html
<html><body>23,1,199,500,21,128,I Used To Be Above Love - Artie Shaw and His Orchestra; Wes Vaughan, vocal</body></html>

A parser compatible with this would be https://github.com/adblockradio/webradio-metadata/blob/master/parsers/France/France%20Info.js, but your version would be simpler, as the result is very short.

Then the website does a call to ws.audioscrobbler.com (Last.FM) to get the track image.

stsier commented 5 years ago

ok thanks a lot, i will try your scripts, because scrapping a thirdparty website in the middle is not very clean