ewenme / geniusr

work with data & lyrics from Genius
https://ewenme.github.io/geniusr/
Other
50 stars 15 forks source link

get_lyrics_id issue #17

Closed stgblack closed 2 years ago

stgblack commented 3 years ago

Whenever I request the lyrics for a song, I only get a blank tibble instead of a tibble with the lyrics. This is also a problem with get_lyrics_search. Other functions like search_song and get_song work just fine though. The output is shown below. Thank you so much for your help.

get_lyrics_id(song_id=50158) A tibble: 0 x 6 ... with 6 variables: line , section_name , section_artist , song_name , artist_name , song_id

MalcolmMashig commented 2 years ago

same issue

morosophist commented 2 years ago

I created this patch to fix this issue on my local box.

4c4
<   lyrics <- html_nodes(session, ".lyrics p")
---
> #  lyrics <- html_nodes(session, ".lyrics p")
5a6,8
> # edit 11/21/2021
>   lyrics <-  session %>% html_nodes(xpath = '//div[contains(@class, "Lyrics__Container")]')#
>
7,8c10,11
<   song <- html_nodes(session, ".header_with_cover_art-primary_info-title") %>%
<     html_text()
---
> #  song <- html_nodes(session, ".header_with_cover_art-primary_info-title") %>%
> #    html_text()
10,11c13,22
<   artist <- html_nodes(session, ".header_with_cover_art-primary_info-primary_artist") %>%
<     html_text()
---
> # edit 11/21/2021
>   song <-  session %>% html_nodes(xpath = '//span[contains(@class, "SongHeaderVariantdesktop__")]') %>%
>     html_text(trim = TRUE)
>
> #  artist <- html_nodes(session, ".header_with_cover_art-primary_info-primary_artist") %>%
> #    html_text()
>
> # edit 11/21/2021
>     artist <-  session %>% html_nodes(xpath = '//a[contains(@class, "SongHeaderVariantdesktop__Artist")]') %>%
>       html_text(trim = TRUE)
redapemusic35 commented 2 years ago

I created this patch to fix this issue on my local box.

4c4
<   lyrics <- html_nodes(session, ".lyrics p")
---
> #  lyrics <- html_nodes(session, ".lyrics p")
5a6,8
> # edit 11/21/2021
>   lyrics <-  session %>% html_nodes(xpath = '//div[contains(@class, "Lyrics__Container")]')#
>
7,8c10,11
<   song <- html_nodes(session, ".header_with_cover_art-primary_info-title") %>%
<     html_text()
---
> #  song <- html_nodes(session, ".header_with_cover_art-primary_info-title") %>%
> #    html_text()
10,11c13,22
<   artist <- html_nodes(session, ".header_with_cover_art-primary_info-primary_artist") %>%
<     html_text()
---
> # edit 11/21/2021
>   song <-  session %>% html_nodes(xpath = '//span[contains(@class, "SongHeaderVariantdesktop__")]') %>%
>     html_text(trim = TRUE)
>
> #  artist <- html_nodes(session, ".header_with_cover_art-primary_info-primary_artist") %>%
> #    html_text()
>
> # edit 11/21/2021
>     artist <-  session %>% html_nodes(xpath = '//a[contains(@class, "SongHeaderVariantdesktop__Artist")]') %>%
>       html_text(trim = TRUE)

I am not much of a coder. How do you apply this patch?

MalcolmMashig commented 2 years ago
get_lyrics <- function (session) {
  lyrics <-  session %>% html_nodes(xpath = '//div[contains(@class, "Lyrics__Container")]')
  song <-  session %>% html_nodes(xpath = '//span[contains(@class, "SongHeaderVariantdesktop__")]') %>% html_text(trim = TRUE)
  artist <-  session %>% html_nodes(xpath = '//a[contains(@class, "SongHeaderVariantdesktop__Artist")]') %>% html_text(trim = TRUE)
  xml_find_all(lyrics, ".//br") %>% xml_add_sibling("p", "\n")
  xml_find_all(lyrics, ".//br") %>% xml_remove()
  lyrics <- html_text(lyrics, trim = TRUE)
  lyrics <- unlist(strsplit(lyrics, split = "\n"))
  lyrics <- grep(pattern = "[[:alnum:]]", lyrics, value = TRUE)
  if (is_empty(lyrics)) {
    return(tibble(line = NA, section_name = NA, section_artist = NA, 
                  song_name = song, artist_name = artist))
  }
  section_tags <- nchar(gsub(pattern = "\\[.*\\]", "", lyrics)) == 0
  sections <- geniusr:::repeat_before(lyrics, section_tags)
  sections <- gsub("\\[|\\]", "", sections)
  sections <- strsplit(sections, split = ": ", fixed = TRUE)
  section_name <- sapply(sections, "[", 1)
  section_artist <- sapply(sections, "[", 2)
  section_artist[is.na(section_artist)] <- artist
  tibble(line = lyrics[!section_tags], section_name = section_name[!section_tags], 
         section_artist = section_artist[!section_tags], song_name = song, 
         artist_name = artist)
}
assignInNamespace("get_lyrics", get_lyrics, "geniusr")
mattroumaya commented 2 years ago

@MalcolmMashig - thanks so much for the fix above! You should submit as a PR, I was also having the same issue.

elinevisser23 commented 2 years ago

Implemented the fix following @MalcolmMashig. As far as I understand, this creates a new function get_lyrics, and I don't understand how to use it. I tried these, but they give errors about unused arguments.

a <- get_lyrics(song_lyrics_url = "https://genius.com/Adje-base-lyrics")
b <- get_lyrics(song_id = 3039923)
c <- get_lyrics(artist_name = "Anderson .Paak", song_title = "Come Home")

The old functions get_lyrics_id, get_lyrics_url and get_lyrics_search still return empty dataframes.

aa <- get_lyrics_id(song_id = 3039923)
ab <- get_lyrics_url(song_lyrics_url = "https://genius.com/Adje-base-lyrics")
ac <- get_lyrics_search(artist_name = "Anderson .Paak", song_title = "Come Home")

@mattroumaya how did you do it?

quilvioo commented 2 years ago

Implemented the fix following @MalcolmMashig. As far as I understand, this creates a new function get_lyrics, and I don't understand how to use it. I tried these, but they give errors about unused arguments.

a <- get_lyrics(song_lyrics_url = "https://genius.com/Adje-base-lyrics")
b <- get_lyrics(song_id = 3039923)
c <- get_lyrics(artist_name = "Anderson .Paak", song_title = "Come Home")

The old functions get_lyrics_id, get_lyrics_url and get_lyrics_search still return empty dataframes.

aa <- get_lyrics_id(song_id = 3039923)
ab <- get_lyrics_url(song_lyrics_url = "https://genius.com/Adje-base-lyrics")
ac <- get_lyrics_search(artist_name = "Anderson .Paak", song_title = "Come Home")

@mattroumaya how did you do it?

Hey @elinevisser23 ! Not the one who wrote the patch, but I was able to implement it. So, it's actually overwriting one of the functions not imported (check out ?assignInNamespace for more info). If you look at lyrics.r, you'll see the function it's replacing right there on line 1. You could probably just run @MalcolmMashig code as is in the console. You might have to load a couple additional packages/dependencies (e.g. rvest and xml2). After running it in console, you should be able to use the get_lyrics_* functions as normal with their typical arguments.

Alternatively, depending on your use case you could paste it into a separate file, say lyric_patch.R and load it into whatever you're using it for with source("lyric_patch.R").

Hope this helps!

mulderc commented 2 years ago

patch doesn't appear to work anymore, getting this error

get_lyrics_id(song_id = 904479)

Error in section_artist[is.na(section_artist)] <- artist : 
  replacement has length zero
morosophist commented 2 years ago

@mulderc

Change these two lines in the patch above and see what you get. Haven't had time to thoroughly check, but I was able to pull lyrics.

song <- session %>% html_nodes(xpath = '//span[contains(@class, "SongHeaderdesktop__")]') %>% html_text(trim = TRUE) artist <- session %>% html_nodes(xpath = '//a[contains(@class, "SongHeaderdesktop__Artist")]') %>% html_text(trim = TRUE)

Changing

SongHeaderVariantdesktop__ -----> SongHeaderdesktop__

ewenme commented 2 years ago

closed by #18. thanks everyone! :)

naddafli commented 1 year ago

hey guys,

i tried the patch by @MalcolmMashig and the changing of the lines suggested by @morosophist.But it still doesnt work. anyone knows whats up with that :(?

thanks in advance!

maria-ascolese commented 7 months ago

This is an old issue but I'm going to post the fix for everyone still struggling with it! I was facing the same problem as @naddafli and basically you also need to replace this:

artist <- session %>% html_nodes(xpath = '//a[contains(@class, "SongHeaderdesktop__Artist")]') %>% html_text(trim = TRUE)

with this:

artist <- session %>% html_nodes(xpath = '//a[contains(@class, "HeaderArtistAndTracklistdesktop__Artist")]') %>% html_text(trim = TRUE)

siskiyoucedar commented 2 months ago

Afraid the get_lyrics commands are broken again w.r.t. the artist data - does anyone have a fix? And a way to find out when it will need fixing again? the Genius API pages don't make their changes exactly plain...