datahq / dataflows

DataFlows is a simple, intuitive lightweight framework for building data processing flows in python.
https://dataflows.org
MIT License
193 stars 39 forks source link

Join on row number #132

Closed roll closed 4 years ago

roll commented 4 years ago

Tests fail because of https://github.com/frictionlessdata/tabulator-py/issues/309 (fixed in tabulator@1.38.3)

roll commented 4 years ago

Hey @akariv @cschloer

It's an initial POC for the feature. It uses a special character # to address row numbers starting from 1 on the first row (excluding headers).

population.csv

id,population
1,8
2,2
4,3

cities_comments.csv

city,comment
paris,city with population in row 2
london,city with population in row 1
rome,city with population in row 3
def test_join_row_number_format_string():
    from dataflows import load, set_type, join
    flow = Flow(
        load('data/population.csv'),
        load('data/cities_comment.csv'),
        join(
            source_name='population',
            source_key='city with population in row {#}',
            target_name='cities_comment',
            target_key='{comment}',
            fields={'population': {'name': 'population'}}
        ),
    )
    data = flow.results()[0]
    assert data == [[
        {'city': 'paris', 'population': 2, 'comment': 'city with population in row 2'},
        {'city': 'london', 'population': 8, 'comment': 'city with population in row 1'},
        {'city': 'rome', 'population': 3, 'comment': 'city with population in row 3'},
    ]]
coveralls commented 4 years ago

Pull Request Test Coverage Report for Build 461


Totals Coverage Status
Change from base Build 458: 0.0%
Covered Lines: 1749
Relevant Lines: 2049

💛 - Coveralls
cschloer commented 4 years ago

Hey, just ran this through a pipeline and it works great. I am able to do a horizontal concatenate by just setting the source_key to ['#'] and the target key to ['#']

roll commented 4 years ago

@akariv Please please take a look :smiley:

akariv commented 4 years ago

Hey, this looks good - just update the documentation for this new option :)

roll commented 4 years ago

Hi @akariv,

The docs are done and the PR is ready for a review