This PR tweaks drop_singletons!() for speed, mainly by:
Multi-threading the second for loop, which drops the singletons
Stopping the count of observations in each FE group at 2, and storing the counters in bytes, which reduces allocations and probably uses CPU caches more efficiently
I'm new to PRs so feedback welcome.
This PR tweaks
drop_singletons!()
for speed, mainly by: