Store first value in Dict directly in innerjoin

I saw some of the discussion in https://github.com/JuliaData/DataFrames.jl/issues/2340 and got curious about what was possible.

This avoids allocating a Vector for the case where l does not have multiple indices with the same value. For the smoke-test benchmark in https://github.com/JuliaData/DataFrames.jl/issues/2340#issuecomment-668566508, this reduces allocations by half and overall runtime by 10%.

Most of the allocations still come from this line which it's much less clear how to reduce allocations in. I'm not sure how much https://github.com/JuliaLang/julia/issues/24909 affects performance in this case. One option would be to heuristically estimate the size of out based on the size of l and r and call sizehint! on it; this didn't seem to help in my testing.

I realize I'm only optimizing a single method of innerjoin, but I'm not super familiar with this field nor with the inner workings of this package, so I leave it to you to decide if this is a worthwhile complication and if it's relevant elsewhere in the package.

JuliaData / SplitApplyCombine.jl

Store first value in Dict directly in innerjoin #29