PRQL / prql

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement
https://prql-lang.org
Apache License 2.0
9.8k stars 212 forks source link

Arrow type system as base for the type system #4472

Open snth opened 4 months ago

snth commented 4 months ago

Motivation

Arrow is increasingly becoming a Lingua Franca interchange format in the data world. Its aims are to have efficient in memory representations of data structures that allow zero-copy reuse of data between different systems (e.g. between Python and R). The creators have thought a lot about making the format memory and cache efficient and what hurdles they encountered in their previous work (e.g. Pandas, dplyr, ...). As such it benefits from a wealth of experience and is geared towards modern analytics workloads. It is therefore no surprise that we are seeing it increasingly being adopted as the basis for new data tools such as DataFusion, Polars, InfluxDB v3, Lance v2, ... to name a few that I have personally encountered.

I therefore think that it would be a good starting point for the type system for PRQL as tracked in #1965 .

Resources

Further work

Compare it with the RFC in #1964 and highlight similarities and differences for further discussion.

snth commented 4 months ago

Below is a summary of the Arrow Schema.fbs specification by gemini-1.5-pro-api-preview for the impatient:


This document describes the Arrow typesystem, which is used to represent structured data like tables or JSON objects. You can think of it as a schema definition language similar to SQL schemas or JSONSchema.

Here's a breakdown:

This specification also defines metadata versions to ensure compatibility between different implementations of Arrow. Additionally, it outlines features that may not be fully supported by all implementations, allowing for forward compatibility and negotiation between clients and servers.

mav3ri3k commented 3 weeks ago

I am interested in this. What is the current progress or somewhere I can follow this?