beetbox / beets

music library manager and MusicBrainz tagger
http://beets.io/
MIT License
12.73k stars 1.81k forks source link

Get genre from Google Music #1569

Closed oblomovx closed 2 years ago

oblomovx commented 9 years ago

I like how the Google Play music store has set up it's genres: A few big main genres. Is it possible to scrape the genre from there?

Kraymer commented 9 years ago

Google Play is not a possible source for lastgenre plugin. Yet, you can achieve a probably similar result by using the canonicalization with the google play genres listed as "seeds" in the yaml file.

oblomovx commented 9 years ago

Yes, I did fiddle with the canonicalization. It works alright, but the genre tagging is not very acccurate:

lastgenre: genre for album New Kids on the Block - Face the Music (album): Pop lastgenre: genre for album New Kids on the Block - Hangin' Tough (album): Pop lastgenre: genre for album New Kids on the Block - Step by Step (album): Alternative Rock lastgenre: genre for album New Kids on the Block - Tour Souvenir Collection (artist): Pop

But I guess that is to blame on the Last.fm users.

sampsyo commented 9 years ago

Interesting idea. Do you know anything about how we might go about that—an API, etc.?

Also, FWIW, you might get more consistent results out of lastgenre by using artist mode instead of album mode.

oblomovx commented 9 years ago

There is no official API, only on unofficial one at https://github.com/simon-weber/gmusicapi (seems quite well maintained)

Don't know if it is feasible to scrape google music itself?

oblomovx commented 9 years ago

I have no experience with Python, only php. I created a very simple form that takes an artist and album as input en returns the genres listed for that album. Would be great if anyone could make it into a beets plugin!

<?php

$baseUrl = "https://play.google.com";
$albumSearch = "/store/search?c=music&docType=2&hl=en&q=";

echo "<form method='POST' action='./'>";
echo "<p><input type='text' name='artist'/></p>";
echo "<p><input type='text' name='album'/></p>";
echo "<p><input type='submit' /></p>";
echo "</form>";

if(!isset($_POST['album'])){
    exit;
}

$artist = urlencode($_POST['artist']);
$album = urlencode($_POST['album']);

$url = $baseUrl.$albumSearch.$artist."+".$album;

$result = file_get_contents($url);

$dom = new DOMDocument();
@$dom->loadHTML($result);

$xpath = new DOMXpath($dom);
$albums = $xpath->query('//div[@class="search-page"]//a[@class="card-click-target"]');

$albumUrl = "https://play.google.com".$albums->item(0)->getAttribute('href')."&hl=en";

$result = file_get_contents($albumUrl);

@$dom->loadHTML($result);

$xpath = new DOMXpath($dom);
$genres = $xpath->query('//div[@class="meta-info"][1]//a');

echo "<p>".$_POST['artist']." - ".$_POST['album']."</p>";
foreach($genres as $genre){
    echo $genre->nodeValue."<br/>";
}
LordSputnik commented 9 years ago

The problem with this, and any form of HTML scraping, is that if Google changes the page layout slightly or the way they represent genre information, the code will break.

Which is why an API is helpful - the developers usually guarantee it to be stable, or at least warn people in advance if it's likely to change.

jake-g commented 9 years ago

the unofficial gmusic api is great. I use it to sync and create playlists. Not sure if you can pull genre from any query as a free user. The question is where does google get the genre? and can we scrape that...I've been using lastfm genres, but it has alot of possibilities...sometimes too many (i know there are whitelists and stuff). I was thinking of moving the lastfm tags to comments or some other tag and use a more conservative subset for genre field. like google or wikipedia or discogs maybe?

wisp3rwind commented 2 years ago

Closing, since there's no more Google Play Music: https://github.com/beetbox/beets/issues/4089