emkor / audiopyle

Audio feature extraction engine based on VAMP plugins
GNU General Public License v3.0
2 stars 1 forks source link

Scraper: create specification, research potential sources #59

Closed emkor closed 6 years ago

emkor commented 8 years ago

Scraper is the next app for audiopyle system. Base idea is to download song information (like album, artist etc., maybe tags?) from internet sources, like MusicBrainz, Last.fm, Spotify etc and store it in specified format. It would be useful for ML and for feature analysis.

Ziemnior commented 8 years ago

It would be a lot easier if songs would have metadata - without them there will be a lot more work to obtain informations we need. Some of the websites provide an official api (Discogs, last.fm, Spotify, MusicBrainz), there is also (unofficial?) wikipedia api that may come in handy. In worst case scenario I think it won't be a problem to manually scrape them. I used only Beatuiful Soap for this kind of tasks so far, but I'll do research later if it's the most efficient tool. When it comes to genres - please remember these often vary, so it would be a good idea to exclude things like 'real recognize real and this nigga the realest' - maybe create list of accepted genres? (Storing it in external file seems little odd for me).

emkor commented 8 years ago

I think that we should take care of filtering genres and so on later, when we have them ready in db. And yeah, audiopyle will extract features from mp3 files, so I hope most of those tracks will contain artist and track title in its meta - otherwise scraping data would be impossible. This issue is more like a research task - find out public apis et cetera

emkor commented 6 years ago

Abandoned