Netflix / PigPen

Map-Reduce for Clojure
Apache License 2.0
565 stars 55 forks source link

Fix self-joins with the same key (which are rare & pointless) #28

Closed mbossenbroek closed 10 years ago

mbossenbroek commented 10 years ago

This fixes the situation where you were to join a relation to itself on the same join key. This type of join is pointless because you end up with a duplicate of the data and a lot of extra work, but it can arise if you're doing dynamically generated joins.

The strategy for the fix is to assign new ids to key selectors for joins such as this. This makes all input joins appear to be different relations. The values passed to the combining function simply use one of the duplicates because they are identical.