Open janjagusch opened 2 years ago
Having looked at the code more thoroughly, I think we should:
get_store_from_url
from_url(cls, url: str) -> ""KeyValueStore"
to the definition of KeyValueStore
. Alternatively, we could add a classmethod from_parsed_url(cls, scheme, host, port, path, query, userinfo) -> "KeyValueStore"
, similar to extract_params
.get_store_from_url
then only looks up the schema
and delegates the rest of the logic to from_(parsed)_url
fsspec
does and allow registering additional stores. Registration would happen through an entry point in the setup.py.Apart from now being able to register additional stores, this should also clean up _urls.py
, _get_store.py
, _store_creation.py
, and _store_decoration.py
.
FYI: @SimonBohnenQC
This concept sounds much better than the current structure 👍🏼
As authentication was an issue in #51, I would suggest testing get_store_from_url
against live AWS / GCS. Can we provide credentials as GitHub secrets to do this @janjagusch?
We would probably not be using create_store
, url2dict
, and get_store
internally anymore. We should probably still keep them available, right? Deprecating them seems like the best choice.
I would love to keep create_store
as this gives a way to pass parameters where there can also be more schema validation upfront. A typical use case is to have the store configuration in a JSON/YAML and then use create_store(json.load(…))
.
I would love to keep
create_store
as this gives a way to pass parameters where there can also be more schema validation upfront. A typical use case is to have the store configuration in a JSON/YAML and then usecreate_store(json.load(…))
.
I see the point of creating stores only from str
or int
parameters. I would also prefer if store objects could be created without relying on an internet connection or successful authentication. Thus, I'd like to move e.g. the create_store_azure
code to a constructor for the AzureBlockBlobStore
class.
What do you mean by "schema validation" in this context?
With "schema validation", I mean that you can specify a type schema for JSON/YAML files and validate that they have the correct values before creating the stores themselves. This has been useful in the past in using the stores in some settings to check whether the configuration is valid before deploying it to production.
As a user of the
minimalkv
framework, I want to create and register new backends and use them through theget_store_from_url
function without changing theminimalkv
library. This is currently not possible, as theextract_params
function hard-codes the known storage types:https://github.com/data-engineering-collective/minimalkv/blob/main/minimalkv/_urls.py#L70-L122
What I'm imagining is a registration function that makes
minimalkv
aware of this new storage type, similarly to howfsspec
does it.