coopercenter / cte-trailblazers

MIT License
0 stars 0 forks source link

Edit Cluster Growth dashboard so plots are pulled from saved ggplot objects instead of generated from data #15

Closed athena-small closed 2 years ago

athena-small commented 3 years ago

Depends on #14

athena-small commented 2 years ago

The plots that populate the dashboard are currently generated from data and code. The current code fails to incorporate some needed stylistic elements.

The task: Revise the dashboard code so that plots are instead loaded from saved ggplot objects.

The ggplot objects are saved in.Rds files within the ./ggplots folder.

(So far have saved only the Education plots, "Figure 1", for each of seventeen clusters. @arthursmalliii is in process of writing code to save the other two plots for each cluster.)

athena-small commented 2 years ago

Added a few of the Job Growth ggplots (Figure 2).

(Currently fixing a bug, to allow generating more.)

caneale320 commented 2 years ago

@arthursmalliii is there any chance the plots could be saved in a single Rdata file? The rds format means I have to individually name and assign every variable after individually loading them. Not only is it more work now, but god forbid any of the names or order of input files change -- mayhem.

I can't seem to find where the plots are currently being generated but if you point me in the right direction I can take a look.

athena-small commented 2 years ago

The plots are generated in the file ./r/make-cluster-plots.Rmd. But I'm currently working on it, so please don't touch.

The rds format means I have to individually name and assign every variable after individually loading them.

That's what functions are for.

... god forbid any of the names or order of input files change -- mayhem.

The file names in ./ggplots each describe exactly what the plot is: the meaning does not depend on ordering.

Nonetheless: I'll write a script that merges all the objects in all the .Rds files in this folder into a single .Rdata file. Stand by.

athena-small commented 2 years ago

Small change of plan:

I'll pack all of the ggplots inside a single structured list.

Then I'll save that list in an .Rds file.

You can use readRDS() to retrieve the list; the object inside will be a structured list with named ggplots.

athena-small commented 2 years ago

@caneale320 : The plots will be loaded into a list called all_ggplots_list, saved in the file ./ggplots/all_ggplots_list.Rds.

The list includes three sub-lists -- edu, job_growth, and wages -- each with slots for holding 17 plots, one for each cluster. To retrieve the education plot (Figure 1) for the Agriculture cluster, for example, you could write

all_ggplots_list$edu[[which(cluster_names_vec == "Agriculture, Food, and Natural Resources")]]

or simply all_ggplots_list$edu[[1]].

You can (as usual) see the structure of the list using the str() command. For example, when just created, before being loaded with plots:

> str(all_ggplots_list$edu)
List of 17
 $ Agriculture, Food, and Natural Resources         : NULL
 $ Architecture and Construction                    : NULL
 $ Arts, Audio/Video Technology, and Communications : NULL
 $ Business Management and Administration           : NULL
 $ Education and Training                           : NULL
 $ Finance                                          : NULL
 $ Government and Public Adminstration              : NULL
 $ Health Science                                   : NULL
 $ Hospitality and Tourism                          : NULL
 $ Human Services                                   : NULL
 $ Information Technology                           : NULL
 $ Law, Public Safety, Corrections, and Security    : NULL
 $ Manufacturing                                    : NULL
 $ Marketing                                        : NULL
 $ Science, Technology, Engineering, and Mathematics: NULL
 $ Transportation, Distribution, and Logistics      : NULL
 $ Energy                                           : NULL

The code for creating the empty list is below, excerpted from ./r/make-cluster-plots.Rmd.

# Read in prepared data; filter to only statewide data
readRDS(here::here("data_prep","nonduplicated-all-regions-and-pathways.Rds")) %>%
  filter(Region == "Virginia") -> nonduplicated_prepped

# Extract vector of cluster names
cluster_names_vec <- unique(nonduplicated_prepped$cluster, fromLast = TRUE) 
cluster_names_vec <- cluster_names_vec[which(cluster_names_vec != "All Occupations")]

# Create empty list to hold all the ggplots to be created
empty_list <- vector("list", length(cluster_names_vec))
names(empty_list) <- cluster_names_vec
all_ggplots_list <- list(edu        = empty_list,
                         job_growth = empty_list,
                         wages      = empty_list
                         )
athena-small commented 2 years ago

Change of plan: I will save the .Rds file in the folder ./dashboard.

I'll also save there a copy of the vector of cluster names, and a copy of the data file.