enso-org / dataframes

A library for working with tabular data in Luna.
https://luna-lang.org
MIT License
6 stars 5 forks source link
dataframes hybrid luna textual visual visualisation

Dataframes implementation in Luna

Purpose

This project is a library with dataframes implementation. Dataframes are structures allowing more comfortable work with big datasets.

Build status

Environment Build status
CI Build (macOS, Linux, Windows) Build Status

Third-party dependencies

Required dependencies:

Optional dependencies: These dependencies are not required to compile the helper library, however without them certain functionalities shall be disabled.

Build & Install

Overview

The library currently provides wrappers for Apache Arrow structures.

Storage types

Type tag types

These types are provided by the library to identify types that can be stored by Array and their mapping to Luna types. Currently provided type tags are listed in the table below.

Tag type Luna value type Apache Arrow type Memory per element
StringType Text utf8 non-nullable 4 bytes + 1 byte per character + 1 bit mask
MaybeStringType Maybe Text utf8 nullable as above
Int64Type Int int64 non-nullable 8 bytes + 1 bit mask
MaybeInt64Type Maybe Int int64 nullable as above
DoubleType Real double non-nullable 8 bytes + 1 bit mask
MaybeDoubleType Maybe Real double nullable as above

Note: Arrow's utf8 type is a list of non-nullable bytes.

IO types

CSV and Feather files are supported. XLSX files are supported if the helper C++ library was built with XLNT third-part library enabled.

Format Parser Type Generator Type Remarks
CSV file CSVParser CSVGenerator
XLSX XLSXParser XLSXGenerator Requires optional XLNT library
Feather FeatherParser FeatherGenerator Best performance, not all value types are currently supported

Methods

Parser type shall provide the following method:

Column names are by default read from the file. CSV and XLSX parsers can also work with files that do not contain the reader row. In such case one of the methods below should be called:

Similarly, the CSV and XLSX generators can be configured whether to output a heading row with names.

The CSV generator can be also configured whether the fields should be always enclosed within quotes or whether this should be done only when necessary (the latter being the default):

Other types

Data processing API

Data description API

Table

Column

Tutorial

TBD