Lisp-Stat / sqldf

SQL for Data Frames
https://lisp-stat.github.io/sqldf
Microsoft Public License
2 stars 1 forks source link
common-lisp sqlite3

Contributors Forks Stargazers Issues MS-PL License LinkedIn


Logo

SQLDF

SQLDF is a system for SQL on data frames, optimised for memory.
Explore the docs »

Report Bug · Request Feature · Reference Manual

Table of Contents

  1. About The Project
  2. Getting Started
  3. Usage
  4. Roadmap
  5. Resources
  6. Contributing
  7. License
  8. Contact

About the Project

SQLDF make it easy to query data frames. SQL is the de-facto standard for data manipulation, with ample learning resources and SQL skills are ubiquitous. Although any query that can be done in SQL can also be done with the data frame API, SQL will be easier for most data scientists starting out with Lisp Stat. It is similar to the R package of the same name.

The price for convenient queries is memory. At worst twice the data set memory is used: one for the original data frame, and one for the in-memory SQLite database that is constructed to query. SQLDF automatically creates this in-memory database, creates the tables and schema corresponding to the data frame, transfers the data and performs the query. The query results are returned as another data frame. This is suprisingly fast and memory is typically not an issue on data workstations.

Built With

Getting Started

To get a local copy up and running follow these steps:

Prerequisites

An ANSI Common Lisp implementation. Developed and tested with SBCL and CCL.

Installation

To make the system accessible to ASDF (a build facility, similar to make in the C world), clone the repository in a directory ASDF knows about. By default the common-lisp directory in your home directory is known. Create this if it doesn't already exist and then:

  1. Clone the repositories

    cd ~/common-lisp && \
    git clone https://github.com/Lisp-Stat/data-frame.git && \
    git clone https://github.com/Lisp-Stat/sql-df && \
    git clone https://github.com/Lisp-Stat/select && \
    git clone https://github.com/TeMPOraL/cl-sqlite.git
  2. From the REPL reset the ASDF source-registry to find the new systems:

    (asdf:clear-source-registry)
  3. Load the system

    (asdf:load-system :sql-df)

If you have installed the slime ASDF extensions, you can invoke this with a comma (',') from the slime REPL.

Getting dependencies

To get the third party systems that these system may depend on, you can use a dependency manager, such as Quicklisp or CLPM Once installed, get the dependencies with either of:

(clpm-client:sync :sources "clpi") ;sources may vary
(ql:quickload :sql-df)

You need do this only once. After obtaining the dependencies, you can load the system with ASDF as described above without first syncing sources.

Usage

Load the iris data set from R:

(asdf:load-system :lisp-stat)
(asdf:load-system :lisp-stat/rdata)
(defdf iris
    (read-csv 'rdata:iris)))

and query it:

(pprint (sqldf:sqldf "select species, count(*) from iris group by species"))

;;   SPECIES    COUNT(*)
;; 0 setosa           50
;; 1 versicolor       50
;; 2 virginica        50

For more examples, please refer to the Documentation.

Roadmap

SQLDF is currently written using an apparently abandoned library, cl-sqlite. Pull requests from 2012 have been made with no response from the author, and the SQLite FFI interface has improved considerably in the 12 years since it was last updated.

We choose CL-SQLite because, at the time of writing, it was the only SQLite library with a commercially acceptable license. Since then, CLSQL has migrated to a BSD license and is a better option for new development. Not only does it support CommonSQL, the de-facto SQL query syntax for Common Lisp, it also supports several additional databases.

All new development on SQLDF will be on CLSQL, possibly including some of the CSV and other extensions available in SQLite. Benchmarks show that SQLite's CSV import is about 15x faster than cl-csv, and a FFI wrapper of SQLite's CSV importer would be a good addition to Lisp-Stat.

Also see the open issues for a list of proposed features (and known issues).

Resources

This system is part of the Lisp-Stat project; that should be your first stop for information. Also see the community page for more information.

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated. Please see CONTRIBUTING.md for details on the code of conduct, and the process for submitting pull requests.

License

Distributed under the MS-PL License. See LICENSE for more information.

Contact

Project Link: https://github.com/lisp-stat/sqldf