datahq / dataflows

DataFlows is a simple, intuitive lightweight framework for building data processing flows in python.
https://dataflows.org
MIT License
193 stars 39 forks source link

Join on row number with mode full-outer #143

Open cschloer opened 4 years ago

cschloer commented 4 years ago

Not sure if this intentional but if you join on a row number with mode full-outer it create a column called '#' with None as the value.

from dataflows import Flow, join

data1 = [
    {"col1": 1},
    {"col1": 2},
    {"col1": 3},
]
data2 = [
    {"col2": 1},
    {"col2": 2},
    {"col2": 3},
]

def test_join():
    flows = [
        data2,
        data1,
        join(
            "res_1",
            "{#}",
            "res_2",
            "{#}",
            fields={"col2": {"name": "col2"}},
            source_delete=True,
            mode="full-outer",
        ),
    ]
    rows, datapackage, _ = Flow(*flows).results()
    print(rows)
    assert rows == [
        [{"col1": 1, "col2": 1}, {"col1": 2, "col2": 2}, {"col1": 3, "col2": 3}]
    ]
[[{'col1': 1, '#': None, 'col2': 1}, {'col1': 2, '#': None, 'col2': 2}, {'col1': 3, '#': None, 'col2': 3}]]