Closed chfoster closed 2 years ago
Quick suggestion to speed the code up a little: Whenever you have a group_by()/summarise() combo, it's worth adding .groups="drop" to the summarise as it generally halves the time it takes to run the summarise() function (if you don't set it, summarise just defaults to it anyway, but takes a bit of extra time to decide that). As well, you're never doing any actual sums in the summarises, so the lines where you've got n=sum(count), you should be able to change to n=count.
So for example, the following:
group_by(sex, subject_name) %>% summarise(n = sum(count)) %>%
Should run quicker, but produce the same result, if changed to:
group_by(sex, subject_name) %>% summarise(n = count, .groups = "drop") %>%
Quick suggestion to speed the code up a little: Whenever you have a group_by()/summarise() combo, it's worth adding .groups="drop" to the summarise as it generally halves the time it takes to run the summarise() function (if you don't set it, summarise just defaults to it anyway, but takes a bit of extra time to decide that). As well, you're never doing any actual sums in the summarises, so the lines where you've got n=sum(count), you should be able to change to n=count.
So for example, the following:
group_by(sex, subject_name) %>% summarise(n = sum(count)) %>%
Should run quicker, but produce the same result, if changed to:
group_by(sex, subject_name) %>% summarise(n = count, .groups = "drop") %>%
Do you know any lines specifically where this could be changed? I can see it in the regional analysis but I do need to sum there. In the industry_subject and subject_industry I think it is already changed?
Quick suggestion to speed the code up a little: Whenever you have a group_by()/summarise() combo, it's worth adding .groups="drop" to the summarise as it generally halves the time it takes to run the summarise() function (if you don't set it, summarise just defaults to it anyway, but takes a bit of extra time to decide that). As well, you're never doing any actual sums in the summarises, so the lines where you've got n=sum(count), you should be able to change to n=count. So for example, the following:
group_by(sex, subject_name) %>% summarise(n = sum(count)) %>%
Should run quicker, but produce the same result, if changed to:group_by(sex, subject_name) %>% summarise(n = count, .groups = "drop") %>%
Do you know any lines specifically where this could be changed? I can see it in the regional analysis but I do need to sum there. In the industry_subject and subject_industry I think it is already changed?
Yep, was just meaning in the industry by subject code. Lines 368, 388, 421, 447 and so on all either have n=sum(count) when there's no summation going on (as far as I could tell, but correct me if I'm wrong!), or have summarise(n=earnings_median) without a .groups="drop" flag.
Pull request overview
Add extra dropdown to the industry by subject tables, close #42.
Pull request checklist
Please check if your PR fulfils the following:
run_tests_locally()
)tidy_code()
)What is the new behaviour?
A new dropdown has been added to the industry by subject tables that allows users to filter the tables further - to the 3-digit SIC group level rather than just the SECTIONNAME level. The dropdown updates based on the selected SECTIONNAME to only include the appropriate group_names.
Anything else
@rmbielby to look at allowing multiple selections at a later date.