MKuranowski / aiocsv

Python: Asynchronous CSV reading/writing
https://pypi.org/project/aiocsv/
MIT License
67 stars 9 forks source link
python

aiocsv

Asynchronous CSV reading and writing.

Installation

pip install aiocsv. Python 3.8+ is required.

This module contains an extension written in C. Pre-build binaries may not be available for your configuration. You might need a C compiler and Python headers to install aiocsv.

Usage

AsyncReader & AsyncDictReader accept any object that has a read(size: int) coroutine, which should return a string.

AsyncWriter & AsyncDictWriter accept any object that has a write(b: str) coroutine.

Reading is implemented using a custom CSV parser, which should behave exactly like the CPython parser.

Writing is implemented using the synchronous csv.writer and csv.DictWriter objects - the serializers write data to a StringIO, and that buffer is then rewritten to the underlying asynchronous file.

Example

Example usage with aiofiles.

import asyncio
import csv

import aiofiles
from aiocsv import AsyncReader, AsyncDictReader, AsyncWriter, AsyncDictWriter

async def main():
    # simple reading
    async with aiofiles.open("some_file.csv", mode="r", encoding="utf-8", newline="") as afp:
        async for row in AsyncReader(afp):
            print(row)  # row is a list

    # dict reading, tab-separated
    async with aiofiles.open("some_other_file.tsv", mode="r", encoding="utf-8", newline="") as afp:
        async for row in AsyncDictReader(afp, delimiter="\t"):
            print(row)  # row is a dict

    # simple writing, "unix"-dialect
    async with aiofiles.open("new_file.csv", mode="w", encoding="utf-8", newline="") as afp:
        writer = AsyncWriter(afp, dialect="unix")
        await writer.writerow(["name", "age"])
        await writer.writerows([
            ["John", 26], ["Sasha", 42], ["Hana", 37]
        ])

    # dict writing, all quoted, "NULL" for missing fields
    async with aiofiles.open("new_file2.csv", mode="w", encoding="utf-8", newline="") as afp:
        writer = AsyncDictWriter(afp, ["name", "age"], restval="NULL", quoting=csv.QUOTE_ALL)
        await writer.writeheader()
        await writer.writerow({"name": "John", "age": 26})
        await writer.writerows([
            {"name": "Sasha", "age": 42},
            {"name": "Hana"}
        ])

asyncio.run(main())

Differences with csv

aiocsv strives to be a drop-in replacement for Python's builtin csv module. However, there are 3 notable differences:

Other, minor, differences include:

Reference

aiocsv.AsyncReader

AsyncReader(
    asyncfile: aiocsv.protocols.WithAsyncRead,
    dialect: str | csv.Dialect | Type[csv.Dialect] = "excel",
    **csv_dialect_kwargs: Unpack[aiocsv.protocols.CsvDialectKwargs],
)

An object that iterates over records in the given asynchronous CSV file. Additional keyword arguments are understood as dialect parameters.

Iterating over this object returns parsed CSV rows (List[str]).

Methods:

Read-only properties:

aiocsv.AsyncDictReader

AsyncDictReader(
    asyncfile: aiocsv.protocols.WithAsyncRead,
    fieldnames: Optional[Sequence[str]] = None,
    restkey: Optional[str] = None,
    restval: Optional[str] = None,
    dialect: str | csv.Dialect | Type[csv.Dialect] = "excel",
    **csv_dialect_kwargs: Unpack[aiocsv.protocols.CsvDialectKwargs],
)

An object that iterates over records in the given asynchronous CSV file. All arguments work exactly the same was as in csv.DictReader.

Iterating over this object returns parsed CSV rows (Dict[str, str]).

Methods:

Properties:

Read-only properties:

aiocsv.AsyncWriter

AsyncWriter(
    asyncfile: aiocsv.protocols.WithAsyncWrite,
    dialect: str | csv.Dialect | Type[csv.Dialect] = "excel",
    **csv_dialect_kwargs: Unpack[aiocsv.protocols.CsvDialectKwargs],
)

An object that writes csv rows to the given asynchronous file. In this object "row" is a sequence of values.

Additional keyword arguments are passed to the underlying csv.writer instance.

Methods:

Readonly properties:

aiocsv.AsyncDictWriter

AsyncDictWriter(
    asyncfile: aiocsv.protocols.WithAsyncWrite,
    fieldnames: Sequence[str],
    restval: Any = "",
    extrasaction: Literal["raise", "ignore"] = "raise",
    dialect: str | csv.Dialect | Type[csv.Dialect] = "excel",
    **csv_dialect_kwargs: Unpack[aiocsv.protocols.CsvDialectKwargs],
)

An object that writes csv rows to the given asynchronous file. In this object "row" is a mapping from fieldnames to values.

Additional keyword arguments are passed to the underlying csv.DictWriter instance.

Methods:

Properties:

Readonly properties:

aiocsv.protocols.WithAsyncRead

A typing.Protocol describing an asynchronous file, which can be read.

aiocsv.protocols.WithAsyncWrite

A typing.Protocol describing an asynchronous file, which can be written to.

aiocsv.protocols.CsvDialectArg

Type of the dialect argument, as used in the csv module.

aiocsv.protocols.CsvDialectKwargs

Keyword arguments used by csv module to override the dialect settings during reader/writer instantiation.

Development

Contributions are welcome, however please open an issue beforehand. aiocsv is meant as a replacement for the built-in csv, any features not present in the latter will be rejected.

Building from source

To create a wheel (and a source tarball), run python -m build.

For local development, use a virtual environment. pip install --editable . will build the C extension and make it available for the current venv. This is required for running the tests. However, due to the mess of Python packaging this will force an optimized build without debugging symbols. If you need to debug the C part of aiocsv and build the library with e.g. debugging symbols, the only sane way is to run python setup.py build --debug and manually copy the shared object/DLL from build/lib*/aiocsv to aiocsv.

Tests

This project uses pytest with pytest-asyncio for testing. Run pytest after installing the library in the manner explained above.

Linting & other tools

This library uses black and isort for formatting and pyright in strict mode for type checking.

For the C part of library, please use clang-format for formatting and clang-tidy linting, however this are not yet integrated in the CI.

Installing required tools

pip install -r requirements.dev.txt will pull all of the development tools mentioned above, however this might not be necessary depending on your setup. For example, if you use VS Code with the Python extension, pyright is already bundled and doesn't need to be installed again.

Recommended VS Code settings

Use Python, Pylance (should be installed automatically alongside Python extension), black and isort Python extensions.

You will need to install all dev dependencies from requirements.dev.txt, except for pyright. Recommended .vscode/settings.json:

{
    "C_Cpp.codeAnalysis.clangTidy.enabled": true,
    "python.testing.pytestArgs": [
        "."
    ],
    "python.testing.unittestEnabled": false,
    "python.testing.pytestEnabled": true,
    "[python]": {
        "editor.formatOnSave": true,
        "editor.codeActionsOnSave": {
            "source.organizeImports": "always"
        }
    },
    "[c]": {
        "editor.formatOnSave": true
    }
}

For the C part of the library, C/C++ extension is sufficient. Ensure that your system has Python headers installed. Usually a separate package like python3-dev needs to be installed, consult with your system repositories on that. .vscode/c_cpp_properties.json needs to manually include Python headers under includePath. On my particular system this config file looks like this:

{
    "configurations": [
        {
            "name": "Linux",
            "includePath": [
                "${workspaceFolder}/**",
                "/usr/include/python3.11"
            ],
            "defines": [],
            "compilerPath": "/usr/bin/clang",
            "cStandard": "c17",
            "cppStandard": "c++17",
            "intelliSenseMode": "linux-clang-x64"
        }
    ],
    "version": 4
}