data-edu / data-science-in-education

Repository for the second edition of 'Data Science in Education Using R' by Emily A. Bovee, Ryan A. Estrellado, Joshua M. Rosenberg, and Isabella C. Velásquez
http://www.datascienceineducation.com/
262 stars 89 forks source link

NRC lexicon unavailable #576

Open RobertTalbert opened 2 years ago

RobertTalbert commented 2 years ago

Hi, in Chapter 11, it says to use the NRC lexicon for sentiment analysis. However using get_sentiments("nrc") returns an error when I select "1" from the little install menu that comes up:

Error: 'C:/Users/[...]/AppData/Local/textdata/textdata/Cache/nrc/NRC-Emotion-Lexicon/NRC-Emotion-Lexicon-v0.92/NRC-Emotion-Lexicon-Wordlevel-v0.92.txt' does not exist.

Apparently this has been an issue for a couple of years now. There are workarounds but perhaps this needs to be fixed in the book, unless I am missing something simple!

ivelasq commented 2 years ago

Hi @RobertTalbert , thanks for your message! Yes, perhaps something has changed since we published the book in 2020. Other folks have reached out and we successfully were able to install the file following this StackOverflow thread. On your computer, run the below:

library(tidyverse)
library(tidytext)
library(textdata)
library(readr)
library(utils)

# check the error
get_sentiments("nrc") # select 1: will throw error but data still has been downloaded
# where is the file, then?
textdata::lexicon_nrc(return_path = T) # it's here
folder_path <- "~/Library/Caches/textdata/nrc"

# the problem is that the default path is wrong, so we have to adjust it
system(paste0("mkdir ", file.path(folder_path, "NRC-Emotion-Lexicon/NRC-Emotion-Lexicon-v0.92")))
system(paste0("cp ", file.path(folder_path, "NRC-Emotion-Lexicon/NRC-Emotion-Lexicon-Wordlevel-v0.92.txt"), " ", file.path(folder_path, "NRC-Emotion-Lexicon/NRC-Emotion-Lexicon-v0.92/")))

# now we have to process the nrc data using a slightly modified version of the subfunction detailed in the original function from the textdata-package: https://github.com/EmilHvitfeldt/textdata/blob/main/R/lexicon_nrc.R
name_path <- file.path(folder_path, "NRCWordEmotion.rds")
# slightly modified version:
process_nrc <- function(folder_path, name_path) {
  data <- read_tsv(file.path(
    folder_path,
    "NRC-Emotion-Lexicon/NRC-Emotion-Lexicon-v0.92/NRC-Emotion-Lexicon-Wordlevel-v0.92.txt"
  ),
  col_names = FALSE, col_types = cols(
    X1 = col_character(),
    X2 = col_character(),
    X3 = col_double()
  )
  )
  data <- data[data$X3 == 1, ]
  data <- tibble(
    word = data$X1,
    sentiment = data$X2
  )
  write_rds(data, name_path)
}

Hope this works for you as well! Let us know!