SuperDuperDB / superduperdb

🔮 SuperDuperDB: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.
https://superduperdb.com
Apache License 2.0
4.53k stars 443 forks source link

When submitting tasks to the Ray cluster, the client’s information should be passed to the Ray job. #2189

Open jieguangzhou opened 1 week ago

jieguangzhou commented 1 week ago

Currently, in our development environment, we typically use export SUPERDUPERDB_CONFIG=xxxxx/config.yaml, and then start the Ray cluster, allowing the client and the Ray-executed tasks to share this configuration.

However, in practical applications, if our Ray cluster is used for multiple applications, it will lead to environment conflicts, including issues with requirements.txt, CFG, envs, etc.

In practice, the Ray cluster should be independent of applications and should not need to be configured for a specific application’s environment. At most, it should have SuperDuperDB installed, and it should also be permissible to run without SuperDuperDB.

For the actual runtime of tasks in Ray, the following configuration items should be passed:

The special cases mentioned above include:

The Ray cluster serves only a single application, or multiple applications within a single environment, thus not needing to consider environment conflicts and pre-configured environments and configs.

Benefits:

The benefit is that we can dynamically use different environments for different applications, allowing multiple applications to connect to the same Ray cluster.

blythed commented 1 week ago

Agreed. This would be useful for a multi-DB setup.

However this will require a re-design of the configuration in superduperdb. Currently all configuration is accessed via the superduperdb.CFG global variable. We can pass the config to the job, but we will need to use CFG only in a restricted context. To do this we will need to pass CFG via db or otherwise to all components which access it.