flow-php / flow

Flow PHP - data processing framework
https://flow-php.com
MIT License
491 stars 28 forks source link

Unify join and joinEach behavior #1212

Open norberttech opened 2 months ago

norberttech commented 2 months ago

Currently join and joinEach behaves a bit differently.

join is using HashJoin algorithm under the hood when joinEach due is based on a nested loop algorithm.

The problem is that the implementation of Nested Loop enforces using join_prefix because if we try to join two dataframes on id column when on both sides this column is called id we are going to get DuplicatedEntriesException coming from Rows::merge() method.

What we should do is to remove from the right dataset join columns to avoid duplicates.