SuperDuperDB / superduperdb

🔮 SuperDuperDB: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.
https://superduperdb.com
Apache License 2.0
4.54k stars 444 forks source link

[MISC] Encode the component’s artifact using the schema of the component. #2147

Open jieguangzhou opened 4 weeks ago

jieguangzhou commented 4 weeks ago
    def dict(self) -> 'Document':
        """A dictionary representation of the component."""
        from superduperdb import Document
        from superduperdb.components.datatype import Artifact, File

        r = super().dict()
        s = self.artifact_schema
        for k in s.fields:
            attr = getattr(self, k)
            if isinstance(attr, (Artifact, File)):
                r[k] = attr
            else:
                r[k] = s.fields[k](x=attr)  # artifact or file

        r['type_id'] = self.type_id
        r['version'] = self.version
        r['identifier'] = self.identifier
        r['hidden'] = False
        return Document(r)

Currently in the encoding process of the component, the schema is not used to encode the artifact, but instead fields are used independently for encoding. This results in the generation of more complex and redundant information, as much of the information actually exists within the artifact_schema, such as various datatypes.

Therefore, after we perform schema.deep_flat_encode_data on the component’s dictionary, the data encoded by the component will be clearer and will reduce a significant amount of nested _leaves.