MTG / metadb

A simple database containing metadata linked to musicbrainz ids
8 stars 7 forks source link

RateYourMusic GenreTree #6

Closed chaicoffee08 closed 8 years ago

chaicoffee08 commented 8 years ago

Hi,

please find the updated repo with the genre list in metadb/scrapers/rym-genres.yaml. i have run the code and made a pdf documentation which i will upload to aulaglobal. hope this satisfies my requirements to clear this assignment. thanks a lot for all the help!

Best, Chaithanya

alastair commented 8 years ago

Can you explain what part of the scraper you worked on? I only see here the yaml output of an existing script in the repository. The requirement for this project is to write a new scraper which can search for a recording or an album on RYM and return the genre of it - I think we talked about this when you came to see me, and I showed you the last.fm example, right? Let me know if this isn't clear and we can go over the requirements again.

chaicoffee08 commented 8 years ago

well, i checked the slides and it said rate your music (genre) so i left it at that, did a study of your code, ran it and made the report!

html parsing is not my forte, and i learnt a lot from studying rymgenre.py, but i am really up to my neck right now and i (wrongly) thought that this would be sufficient to clear the assignment.

i'm not really clear on how to search for stuff on rateyourmusic.com apart from the genre, i mean the genre list at http://rateyourmusic.com/rgenre/ is already there and parseable but there isn't like a clear index of tracks that i could find and there is no API, it's all html which makes things difficult :(

alastair commented 8 years ago

Right, the description in the slides wasn't explicit in the actual task, sorry about that. However, we talked about it in the class, and I'm sure that we also discussed it when you came to see me.

The final goal is to be able to take any artist name and track name and get the genre for this track. The reason we want to get this is to be able to build a dataset containing the genre for all of the recordings that we have in acousticbrainz.

To do scraping on rateyourmusic you would have to perform a search, and then look at the results of the search and load the album data.

Perhaps if you're not comfortable with web scraping you could look at doing the same task for spotify. There is a library which you could use here: https://github.com/plamere/spotipy

See the example of the lastfm file here: https://github.com/MTG/metadb/blob/master/metadb/scrapers/lastfm.py You will need to write a file with a scrape method which takes the artist and recording name, searches for it on spotify

There is a description of how you can test your scraper here: https://github.com/MTG/metadb/issues/1#issuecomment-191348170

chaicoffee08 commented 8 years ago

hey alastair,

sorry for taking so long, i've added another scraper called spotgen.py that works as expected, it takes an artist name and returns the genre for that artist, basically. i don't think it's possible to query genre by track in the spotipy library or maybe i missed something. hope this is sufficient!

thanks again for all the help.

best, chai

alastair commented 8 years ago

This is a really good start! It's exactly what we want for this task. One additional piece of information that you could include in the returned data is information about the Spotify IDs. This would be useful for us in the future if we wanted to get more information from spotify. We could look it up directly with their ID instead of doing another search. So you could return a data block that looks like this:

  {"spotify_artist_name": name,
   "spotify_artist_id": id,
   "spotify_artist_genres": genres,
   "releases": [{"id": id, "title": title, "genres": genres}, ...]
  }
alastair commented 8 years ago

@chaicoffee08 Will you continue with these changes? It would be good to see you also make the recommended changes to the scraper.

chaicoffee08 commented 8 years ago

hey alastair! i am so sorry this completely slipped my attention. i would like to continue working on this but right now i am really busy with my thesis and the subject i am working on (algorithmic composition) is not related to scraping.. i would appreciate it if you could consider the previous scraper (spotgen.py) as my submission for the assignment and grade me accordingly! when i have more time i would like to continue working on this but right now it would be difficult. hope you understand.