Join on row number - Githubissues

roll commented 4 years ago

fixes https://github.com/datahq/dataflows/issues/131

Tests fail because of https://github.com/frictionlessdata/tabulator-py/issues/309 (fixed in tabulator@1.38.3)

roll commented 4 years ago

Hey @akariv @cschloer

It's an initial POC for the feature. It uses a special character # to address row numbers starting from 1 on the first row (excluding headers).

population.csv

id,population
1,8
2,2
4,3

cities_comments.csv

city,comment
paris,city with population in row 2
london,city with population in row 1
rome,city with population in row 3

def test_join_row_number_format_string():
    from dataflows import load, set_type, join
    flow = Flow(
        load('data/population.csv'),
        load('data/cities_comment.csv'),
        join(
            source_name='population',
            source_key='city with population in row {#}',
            target_name='cities_comment',
            target_key='{comment}',
            fields={'population': {'name': 'population'}}
        ),
    )
    data = flow.results()[0]
    assert data == [[
        {'city': 'paris', 'population': 2, 'comment': 'city with population in row 2'},
        {'city': 'london', 'population': 8, 'comment': 'city with population in row 1'},
        {'city': 'rome', 'population': 3, 'comment': 'city with population in row 3'},
    ]]

coveralls commented 4 years ago

Pull Request Test Coverage Report for Build 461

6 of 6 (100.0%) changed or added relevant lines in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 85.359%

Totals
Change from base Build 458:	0.0%
Covered Lines:	1749
Relevant Lines:	2049

💛 - Coveralls

cschloer commented 4 years ago

Hey, just ran this through a pipeline and it works great. I am able to do a horizontal concatenate by just setting the source_key to ['#'] and the target key to ['#']

roll commented 4 years ago

@akariv Please please take a look :smiley:

akariv commented 4 years ago

Hey, this looks good - just update the documentation for this new option :)

roll commented 4 years ago

Hi @akariv,

The docs are done and the PR is ready for a review

datahq / dataflows

Join on row number #132

Pull Request Test Coverage Report for Build 461

💛 - Coveralls