JGCRI / gcamdata

The GCAM data system
https://jgcri.github.io/gcamdata/
Other
43 stars 26 forks source link

Replace in with semijoin #1210

Closed russellhz closed 2 years ago

russellhz commented 2 years ago

Based on #1068

semi_join(x, y, by = "z") is faster than using filter(x, z %in% y$z), except for when there are multiple filters applied at once, in which case it is often faster to have one call to filter() rather than use filter() %>% semi_join().

There were somewhere around a hundred places where it was faster to replace with semi_join() or anti_join(). I also added this to the speed tips but am not planning to enforce it via tests because of the exception.

bpbond commented 2 years ago

Huh, that's interesting!