Currently join and joinEach behaves a bit differently.
join is using HashJoin algorithm under the hood when joinEach due is based on a nested loop algorithm.
The problem is that the implementation of Nested Loop enforces using join_prefix because if we try to join two dataframes on id column when on both sides this column is called id we are going to get DuplicatedEntriesException coming from Rows::merge() method.
What we should do is to remove from the right dataset join columns to avoid duplicates.
Currently join and joinEach behaves a bit differently.
join is using
HashJoin
algorithm under the hood whenjoinEach
due is based on a nested loop algorithm.The problem is that the implementation of Nested Loop enforces using join_prefix because if we try to join two dataframes on
id
column when on both sides this column is calledid
we are going to get DuplicatedEntriesException coming fromRows::merge()
method.What we should do is to remove from the right dataset join columns to avoid duplicates.