dfe-analytical-services / leo-graduate-industry-dashboard

Source code for the Longitudinal Education Outcomes (LEO) graduate industry dashboard.
https://department-for-education.shinyapps.io/leo-graduate-industry-dashboard/
3 stars 2 forks source link

Cf add industry by subject dropdown #49

Closed chfoster closed 2 years ago

chfoster commented 2 years ago

Pull request overview

Add extra dropdown to the industry by subject tables, close #42.

Pull request checklist

Please check if your PR fulfils the following:

What is the new behaviour?

A new dropdown has been added to the industry by subject tables that allows users to filter the tables further - to the 3-digit SIC group level rather than just the SECTIONNAME level. The dropdown updates based on the selected SECTIONNAME to only include the appropriate group_names. image

Anything else

@rmbielby to look at allowing multiple selections at a later date.

rmbielby commented 2 years ago

Quick suggestion to speed the code up a little: Whenever you have a group_by()/summarise() combo, it's worth adding .groups="drop" to the summarise as it generally halves the time it takes to run the summarise() function (if you don't set it, summarise just defaults to it anyway, but takes a bit of extra time to decide that). As well, you're never doing any actual sums in the summarises, so the lines where you've got n=sum(count), you should be able to change to n=count.

So for example, the following: group_by(sex, subject_name) %>% summarise(n = sum(count)) %>%

Should run quicker, but produce the same result, if changed to:

group_by(sex, subject_name) %>% summarise(n = count, .groups = "drop") %>%

chfoster commented 2 years ago

Quick suggestion to speed the code up a little: Whenever you have a group_by()/summarise() combo, it's worth adding .groups="drop" to the summarise as it generally halves the time it takes to run the summarise() function (if you don't set it, summarise just defaults to it anyway, but takes a bit of extra time to decide that). As well, you're never doing any actual sums in the summarises, so the lines where you've got n=sum(count), you should be able to change to n=count.

So for example, the following: group_by(sex, subject_name) %>% summarise(n = sum(count)) %>%

Should run quicker, but produce the same result, if changed to:

group_by(sex, subject_name) %>% summarise(n = count, .groups = "drop") %>%

Do you know any lines specifically where this could be changed? I can see it in the regional analysis but I do need to sum there. In the industry_subject and subject_industry I think it is already changed?

rmbielby commented 2 years ago

Quick suggestion to speed the code up a little: Whenever you have a group_by()/summarise() combo, it's worth adding .groups="drop" to the summarise as it generally halves the time it takes to run the summarise() function (if you don't set it, summarise just defaults to it anyway, but takes a bit of extra time to decide that). As well, you're never doing any actual sums in the summarises, so the lines where you've got n=sum(count), you should be able to change to n=count. So for example, the following: group_by(sex, subject_name) %>% summarise(n = sum(count)) %>% Should run quicker, but produce the same result, if changed to: group_by(sex, subject_name) %>% summarise(n = count, .groups = "drop") %>%

Do you know any lines specifically where this could be changed? I can see it in the regional analysis but I do need to sum there. In the industry_subject and subject_industry I think it is already changed?

Yep, was just meaning in the industry by subject code. Lines 368, 388, 421, 447 and so on all either have n=sum(count) when there's no summation going on (as far as I could tell, but correct me if I'm wrong!), or have summarise(n=earnings_median) without a .groups="drop" flag.