frictionlessdata / tabulator-py

Python library for reading and writing tabular data via streams.
https://frictionlessdata.io
MIT License
235 stars 42 forks source link

Implement CSV opener/reader #2

Closed pwalsh closed 8 years ago

pwalsh commented 8 years ago

Description

This issue describes the basic need for a standalone reader for tabular data. We do want to support a few common formats that such data is often published in, and provide a consistent interface for reading data out of those sources.

As a first step, we should implement this interface for CSV, with a view to that fact that we are designing for other formats like Excel as well.

I've done some initial work in this direction, so here is a mini-spec based on that:

Spec

Example usage

# row would either be an array of values utf-8 encoded strings, or, an object of values keyed by column name
from tabulator import Tabulator

datasource = 'file.csv'
dataformat = 'csv'
options = {
    datasource,  # a filepath or a stream. Some formats, like Excel, could only be a filepath.
    dataformat='csv',  # 'csv', 'json', 'ndjson', 'excel', 'ods'
    schema=None,  # a dict of a valid JSON Table Schema, None, or 'infer'
    headers=None,  # None or integer or iterable . None means none, integer means the row where headers are, iterable means those are the headers, and don't look for them in the file
    encoding=None,  # encoding of the data source. prevents guessing, which can be wrong often
    decode_strategy='replace'  # decode strategy
    keyed=False # whether to return plain tuples or named tuples for each row
}
datable = Tabulator(datasource, dataformat, **options)

for index, row, in enumerate(datatable.values):
    yield (index, row)

Consistency over Py2/3

@pudo comments/additions? I'm hoping we collaborate to turn this into the basis of MessyTables 2 and even going forward to merge what are now GoodTables and MessyTables. Let's get some working code pushed here to talk on :0. One of our developers @roll will be working on this over the next days.

roll commented 8 years ago

DONE