Open jieguangzhou opened 1 week ago
Agreed. This would be useful for a multi-DB setup.
However this will require a re-design of the configuration in superduperdb. Currently all configuration
is accessed via the superduperdb.CFG
global variable. We can pass the config to the job, but we will need to use CFG only in a restricted context. To do this we will need to pass CFG
via db
or otherwise to all components which access it.
Currently, in our development environment, we typically use export
SUPERDUPERDB_CONFIG=xxxxx/config.yaml
, and then start the Ray cluster, allowing the client and the Ray-executed tasks to share this configuration.However, in practical applications, if our Ray cluster is used for multiple applications, it will lead to environment conflicts, including issues with
requirements.txt
,CFG
,envs
, etc.In practice, the Ray cluster should be independent of applications and should not need to be configured for a specific application’s environment. At most, it should have SuperDuperDB installed, and it should also be permissible to run without SuperDuperDB.
For the actual runtime of tasks in Ray, the following configuration items should be passed:
Python env: When the client submits a task to Ray, it should pass
runtime_env.pip: [d1, d2, ...]
or, in special cases, no environment should be passed.CFG object: When the client submits a task to Ray, it should pass the CFG object (via string or binary format), and in the Ray job, this
CFG
information should be reused to reconstruct the CFG for configuration purposes, or in special cases, theCFG
may not need to be passed.The special cases mentioned above include:
The Ray cluster serves only a single application, or multiple applications within a single environment, thus not needing to consider environment conflicts and pre-configured environments and configs.
Benefits:
The benefit is that we can dynamically use different environments for different applications, allowing multiple applications to connect to the same Ray cluster.