MTG / metadb

A simple database containing metadata linked to musicbrainz ids
8 stars 7 forks source link

AMP LAb - Task: Data Collection #2

Closed hellska closed 8 years ago

hellska commented 8 years ago

For the AMP Lab I would like to work with the Discogs API. The title of the lab is "Data Collection" so I interpret this in 2/3 different ways: 1) collect all the information related to genre and style present in the website we select? I take a look to the Discogs API documentation and it seems not so difficult to get the data from the website. 2) retrieve the data, parse the json or XML and format the informations in a useful way to reuse in the AcusticBraiz project! 3) build a Python client that uses the API, that can be adapted to the scope of future projects?

This is the way I interpreted the task, I wait for your suggestion on how to start working on that. Sincerely :Dan

alastair commented 8 years ago

Hi, This is a good proposal. You're right, the discogs API should let you search tracks to get this information. For AcousticBrainz a useful format is JSON. See the comment I left on issue 1: https://github.com/MTG/metadb/issues/1#issuecomment-191348170 There is some information on how to structure the client code so that it can be used within the metadb framework.

hellska commented 8 years ago

Hi Alastair, I just published a quite stable version of a python class that extract informations from the Discogs website. This is the link to the git repo https://github.com/hellska/DiscogScraper I used curl to retrieve the data in json format from the website. Many improvement can be done, I will continue implementing the class. Basic functionality: match an artist/song in Discogs releases and count how many times a specific genre and style appear in the data. The basic weight computation can be done dividing the number of time a genre occur by the total number of times the song is found.

alastair commented 8 years ago

@hellska Can you merge your scraper into the metadb project? You can copy in your scraper code, perhaps to a lib or utils package, and then write a scrape method, like I describe here: https://github.com/MTG/metadb/issues/1#issuecomment-191348170

hellska commented 8 years ago

Hi Alastair, I sent a pull request to the metadb. I created a new script to keep the class and its use separated, I tested it with the lookup.py. Check my pull request and tell me if everything is OK!

alastair commented 8 years ago

merged!