StatCan / aaw

Documentation for the Advanced Analytics Workspace Platform
https://statcan.github.io/aaw/
Other
69 stars 12 forks source link

Bug: Rstudio Server crashes on ggplot() #1433

Closed rohank07 closed 2 years ago

rohank07 commented 2 years ago

Client has encountered running ggplot() causes R session to crash. I was able to replicate. It seems versions of Rstudio Server have this isssue and a solution was to either downgrade or upgrade. We are far behind from the latest release. Updating Rstudio Server to 2022.07.2-576 to see if issue still arises.

Resources: https://stackoverflow.com/questions/67650911/suddenly-ggplot-crashes-r-studio-any-suggestions https://github.com/rstudio/rstudio/issues/9373

Verify using client's script

install.packages(c('tidytuesdayR', 'tidyverse'))
library(tidyverse)
library(tidytuesdayR)
tt_data <- tt_load("2022-10-18")
tt_data$episodes
tt_data$stranger_things_all_dialogue
episodes <- tt_data$episodes
duffer_brothers <- episodes %>% filter(directed_by == 'The Duffer Brothers')
db_2 <- duffer_brothers %>% filter(season < 2)
db_2 %>% arrange(desc(episode))
db_2 %>% mutate(season_episode = paste0(season, '_', episode))
db_2 %>% select(season, episode)
db_2 %>% select(-written_by)
dialog <- tt_data$stranger_things_all_dialogue
#joins. episodes in dialog. left to dialog:first command is object, and other is the column
all <- dialog %>%
  left_join(episodes, by = c('season','episode'))
#summary count.idea of what the data looks like at a macro level. min, max, mean etc
summary(all)
#practice mutate. find length of the line in terms of characters. will make length equal to 1(line_length = 1))
#number of characters nchar(raw_text)
all_test <- all %>% mutate(line_length = nchar(raw_text))
#returns vector of the length of each line
#replace raw_text vector with the length of the line
nchar(all$raw_text)
summary(all_test)
#trends in leght of episode by season episode
#average line length by season and episode
sea_epi_11 <-
  all_test %>%
  group_by(season, episode) %>%
  summarize(avg_line_length = mean(line_length), .groups = 'drop')
summary(sea_epi_11)
#visual data ggplot usualy has aes
 sea_epi_11 %>%
   ggplot() +
   geom_histogram(aes(avg_line_length))

image

rohank07 commented 2 years ago

Tests are failing when upgrading RStudio Server image Removing Blair's fork of jupyter-rsession-proxy and going to reference upstream instead. Seems like issues that weren't previously resolved in R 1.4 are now fixed.

Also encountered issues with RUN chown $NB_USER:users /var/lib/rstudio-server/rstudio.sqlite The file does not exist in the new version of RStudio Server. Going to see if its necessary anymore. Reason it was added: https://github.com/StatCan/aaw-kubeflow-containers/pull/195 https://github.com/StatCan/aaw-kubeflow-containers/blob/master/docker-bits/6_rstudio.Dockerfile#L34

rohank07 commented 2 years ago

Removing older version of R and installing r-base, r-base-dev and adding new CRAN repo does not seem to upgrade the R version. image It should be 4.2.X (2022-XX-XX)

rohank07 commented 2 years ago

Fixed the failing test. Leaving R version as 4.1.2. R server does not crash when running client's code (using ggplot package). Output: image