frictionlessdata / tabulator-py

Python library for reading and writing tabular data via streams.
https://frictionlessdata.io
MIT License
235 stars 42 forks source link

Implemented `blank` preset for `skip_rows` #302

Closed roll closed 4 years ago

roll commented 4 years ago

Hi @mcarans,

Here is an option for skipping completely blank rows based on goodtables-py definition of blank rows. It's based on the @cschloer's idea of supporting regex patterns.

Will it work for your case?

mcarans commented 4 years ago

Thx @roll . Would you be able to add to the test so I can see what it does for a row with blank first column? eg.

def test_stream_skip_rows_preset():
    source = [['name', 'order'], ['', ''], [], ['John', 1], ['Alex', 2]], ['', 3]
    skip_rows = [{'type': 'preset', 'value': 'blank'}]
    with Stream(source, headers=1, skip_rows=skip_rows) as stream:
        assert stream.headers == ['name', 'order']
        assert stream.read() == [['John', 1], ['Alex', 2], ['', 3]]

Is it possible for a column to be None? If so, then the test also needs [None, 4].

roll commented 4 years ago

@mcarans Thanks, I've extended the tests

roll commented 4 years ago

Thanks!