fortran bindings to duckdb c api
DuckDB, at the time of this writing, is at version 1.0.0 (stable release). The fortran module in this repository wraps the C-API. While it is still under development, most of the api functions have been wrapped, allowing access to DuckDB databases, querying and data extraction.
DuckDB provides column-based data storage in contrast to other databases (SQLIte, PostgreSQL, etc) that are row-based. To use the API requires some basic understanding of the data structure.
DuckDB allows extracting data directly from csv, parquet and json files, even if compressed:
select * from 'data.csv.gz'
DuckDB databases can be :in-memory: or file-based. The wrapper provides functions to initialize the database and connection as shown below. In-memory databases are used if no path is provided in the duckdb_open function, which is optional.
type(duckdb_database) :: db
type(duckdb_connection) :: con
if (duckdb_open(db=db) == duckdberror) then
error stop "open error"
end if
if (duckdb_connect(db, con) == duckdberror) then
call duckdb_close(db)
error stop "connect error"
end if
call duckdb_disconnect(con)
call duckdb_close(db)
Once a connection to the database has been established, it can be queried to extract results. Note that SQL strings passed to DuckDB must be null-terminated.
use, intrinsic :: iso_c_binding
type(duckdb_connection) :: con
type(duckdb_result) :: res
integer(kind(duckdb_state)) :: ri
sql = "select * from '" // path // "';" // c_null_char
r = duckdb_query(con, sql, res)
deallocate(sql)
The recommended way to interact with result sets is using chunks and vectors. In DuckDB a chunk is defined as a dataset having a fixed number of rows. This number is configurable but is set at 2048 by default. Chunks can be extracted from result sets using the following functions:
use, intrinsic :: iso_fortran_env, only : int64
type(duckdb_data_chunk) :: chk ! chunk
integer(kind=int64) :: i, nc
type(duckdb_result) :: res ! result set
nc = duckdb_result_chunk_count(res) ! number of chunks
do i = 0_int64, nc
chk = duckdb_result_get_chunk(res, i)
! do somethink with chunk
! ...
call duckdb_destroy_data_chunk(chk)
end do
Vectors in turn are extracted from chunks. DuckDB vectors are column based data having a specific data type. It is important to define the data type in fortran when extracting these vectors. Consider the folllowing dataset that comprises 3 vectors, all of type int64.
column0 | column1 | column2 | int64 | int64 | int64 | |
---|---|---|---|---|---|---|
1 | 2 | 3 | ||||
4 | 5 | 6 | ||||
7 | 8 | 9 |
The functions below outline one possible way to extract data from vectors. These data are returned as a c_ptr, which could be converted into a fortran pointer without the need for allocating additional memory. These pointers will only be available though while the chunk is still active and are lost if the chunk is destroyed or if a new chunk is extracted from the result set.
The code below demonstrates how vector data pointers, that may comprise multiple columns, may be consumed by fortran using a derived type defined in the data type of the result set. Note that DuckDB provides functions to check the validity of the data which could include missing or NULL data.
type vectors
integer(kind=int64), pointer, dimension(:) :: ptr
end type
type(duckdb_data_chunk) :: chk
type(vectors), allocatable, dimension(:) :: vecs
type(duckdb_vector) :: vec
integer(kind=int64) :: j, rows, cols
type(c_ptr) :: va !, vb
integer(kind=int64), pointer, dimension(:) :: a
integer(kind=int64), pointer, dimension(:,:) :: mat ! 2d array
integer :: i
rows = duckdb_data_chunk_get_size(chk)
cols = duckdb_data_chunk_get_column_count(chk)
allocate(vecs(cols))
do j = 0_int64, cols - 1
vec = duckdb_data_chunk_get_vector(chk, j)
va = duckdb_vector_get_data(vec)
! vb = duckdb_vector_get_validity(vec)
call c_f_pointer(va, a, [rows])
vecs(j+1)%ptr => a
! not cared with validity check
! do k = 0, sc - 1
! if (duckdb_validity_row_is_valid(vb, k)) then
! print *, vr(k+1)
! else
! print *, "NULL"
! end if
end do
if (allocated(vecs)) then
allocate(mat(3,3))
do i = 1, 3
mat(1:3, i) = vecs(i)%ptr
end do
print *, mat
deallocate(mat)
end if
deallocate(vecs)
call duckdb_destroy_data_chunk(chk)
call duckdb_destroy_result(res)
call duckdb_disconnect(con)
call duckdb_close(db)
An example of extracting data from a csv file is provided in the example folder.
Requires the c library that can be downloaded from https://github.com/duckdb/duckdb/releases. If you're on archlinux you can install the libraries and headers using "yay duckdb-bin", which includes the cli binary.
Minimum Duckdb version required: 0.8
Test with
fpm test
To include this in your own projects, add this dependency to your fpm.toml:
[dependencies]
duckdb.git = "https://github.com/freevryheid/duckdb"