Fully rethought how to access OAMap data in the large.
A collection of input files is an immutable Backend to a Database.
A single list schema that spans all of the data is a Dataset, and it may be read or used in recastings/transformations to make new Datasets with substantial overlap in array data.
The Database has multiple Backends and generally writes to only one of them.
Datasets are a local handle to submitting distributed jobs over its partitions; using Dask API conventions for easier integration.
Three kinds of operations:
recasting: changing a Dataset by changing its schema only; no distributed job
transformation: creating a new Dataset from a Dataset, lazy until assigned (Spark terminology)
action: creating output for immediate download from a Dataset, strictly evaluated (Spark terminology)
Fully rethought how to access OAMap data in the large.
Backend
to aDatabase
.Dataset
, and it may be read or used in recastings/transformations to make newDatasets
with substantial overlap in array data.Database
has multipleBackends
and generally writes to only one of them.Datasets
are a local handle to submitting distributed jobs over itspartitions
; using Dask API conventions for easier integration.Dataset
by changing its schema only; no distributed jobDataset
from aDataset
, lazy until assigned (Spark terminology)Dataset
, strictly evaluated (Spark terminology)