cardillo / joinery

Data frames for Java
https://joinery.sh
GNU General Public License v3.0
695 stars 167 forks source link

Nonstrict join between tow dataframes #87

Open WillCup opened 4 years ago

WillCup commented 4 years ago

For now, the key in dataframes those to be joined must be unique.

` for (final List row : left) { final Object name = leftIt.next(); final Object key = on == null ? name : on.apply(row); if (leftMap.put(key, row) != null) { throw new IllegalArgumentException("generated key is not unique: " + key); } }

    for (final List<V> row : right) {
        final Object name = rightIt.next();
        final Object key = on == null ? name : on.apply(row);
        if (rightMap.put(key, row) != null) {
            throw new IllegalArgumentException("generated key is not unique: " + key);
        }
    }

`

But we just want to join two dataframe toghter, A has column 'dt' and B has column 'dt', then A and B can join based on 'dt' column. In this scenario, we should not force A has distinct date values..

So, I add a new API called nonStrictJoinOn to solve this problem.

FYI.