Open c42f opened 2 years ago
For dynamically loading Julia modules, in MLDatasets.jl we now use the @lazy
import macro from LazyModules and our own @require
import macro (similar to @lazy
but requiring the user to add the import
to its code).
Here's a rough list of items I'm considering on the path to a DataSets-1.0 release. Several of these can and should be done prior to version 1.0 in case the APIs need to be adjusted a bit before the 1.0 release.
load()
/save()
for this — thinking of DataSets.jl as a new FileIO.jl, I think this would make sense. (Actually, this isn't breaking, so it doesn't need to wait for 1.0.)load()
andsave()
to return some "default type the user cares about" for convenience. For example, returning aDataFrame
for a tabular dataset. This will require addressing the problems of dynamically loading Julia modules that were partially faced in #17dataset()
andopen()
— currently theopen(dataset(...))
idiom is a bit of an awkward double step and leads to some ambiguities. Perhaps we could repurposedataset(name)
to mean whatopen(dataset(name))
currently does?DataSet
? Users should rarely need to use this directly.ctx = ResourceContext(); x = dataset(ctx, "name"); ...; close(ctx)
. Or from ContextManagers.jl in the stylectx = dataset("name"); x = value(ctx); close(ctx)
. (Both of these have macros for syntactic shortcuts.)27
38
BlobTree
API41
42
FilePathsBase
and whether there's a type which can implement theAbstractPath
interface well enough to allow things likeCSV.read(x)
to work for somex
. Perhaps we need aDataSpecification
type for the URI-like concept currently called "dataspec" in the codebase? We could haveCSV.read(data"foo?version=2#a/b")
?@datarun
and@datafunc
. I feel introducing these was premature and the semantics is probably not quite right. We can bring something similar back in the future if it seems like a good idea.[datasets]
section as a dictionary mapping names to configs, not as an array withname
properties. This is safe becauseTOML
syntax does allow arbitrary strings as section names. (Note that either representation is valid when a givenDataSet
is specifically tied to a project.)@__DIR__
templating somehow (fixed in #46)DataSets.PROJECT
toDataSets.PROJECTS
if this is always aStackedDataProject
.