SuperDuperDB / superduperdb

🔮 SuperDuperDB: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.
https://superduperdb.com
Apache License 2.0
4.55k stars 445 forks source link

[APP] Save a component template with variables without triggering jobs #2116

Closed blythed closed 1 month ago

blythed commented 1 month ago

Example:

from superduperdb import Template, Variable

m = Listener(
    model=ObjectModel(
        object=lambda x: x + 2,
        identifier=Variable('model_id'),
    ),
    select=db['=collection'].find(),
    key=Variable('key')
)

# optional "info" parameter provides details about usage (depends on developer use-case)
t = Template(m, info={'key': {'type': 'str'}, 'collection': {'type': str}, 'model_id': {'type': 'str'}})

# doesn't trigger work
db.apply(t) 

listener = t(key='my_key', collection='my_collection', model_id='my_id')

# triggers work
db.apply(listener)

Key part of tech. is to "save" the component m which has variables, without polluting the db.show system. One idea is to save the m.encode() output as an artifact. This would also need to have the artifacts mentioned in the template saved also. When reloading t, should make sure that all artifacts are loaded.

blythed commented 1 month ago

In my experiments I tried something like this:

import dataclasses as dc

@dc.dataclass(kw_only=True)
class Template(Component):
    _artifacts = [('template', pickle_serializer)]

    template: t.Union[Component, t.Dict]

    def __post_init__(self):
        if isininstance(self.template, Component):
            self.template = self.template.encode()

    def on_create(self):
        # save all of the artifacts in self.template in the artifact store also

    def __call__(self, **kwargs):
        # set the variables of the template and return the component
        ...