Closed simonw closed 7 months ago
Options for this:
I thought I'd deployed this to Datasette Cloud, but I just checked and it turns out I've only used this in demos there but not made it available to all users - so the need for automatic upgrade is greatly reduced.
I am going to document this exclusively in the release notes.
If you have previously calculated embedding vectors stored in columns on an existing table, you can migrate them to the new table structure by following these steps:
Create a new shadow table to store your embeddings. This table should have the name _embeddings_TABLE
where TABLE
is the name of your source table, and the same primary key (or compound primary keys). It should then have a emb_xxx
column for each stored embedding vector.
One way to create this table is to select a single row and embed just that - this will create the table for you while only costing you one embedding run.
Run the following SQL query to copy across your vectors:
insert or replace into _embeddings_TABLE (id, emb_xxx)
select id, emb_xxx from TABLE where emb_xxx is not null
alter table TABLE drop column emb_xxx
I'm going to try those instructions against this existing table:
CREATE TABLE [releases] (
[html_url] TEXT,
[id] INTEGER PRIMARY KEY,
[author] INTEGER REFERENCES [users]([id]),
[node_id] TEXT,
[tag_name] TEXT,
[target_commitish] TEXT,
[name] TEXT,
[draft] INTEGER,
[prerelease] INTEGER,
[created_at] TEXT,
[published_at] TEXT,
[body] TEXT,
[repo] INTEGER REFERENCES [repos]([id])
, [reactions] TEXT, [mentions_count] INTEGER, [emb_text_embedding_3_small_512] BLOB)
Confirmed: I ran this and got the correct upgrade:
CREATE TABLE [_embeddings_releases] (
[id] INTEGER PRIMARY KEY,
[emb_text_embedding_3_small_512] BLOB);
insert or replace into _embeddings_releases (id, emb_text_embedding_3_small_512)
select id, emb_text_embedding_3_small_512 from releases where emb_text_embedding_3_small_512 is not null;
alter table releases drop column emb_text_embedding_3_small_512;
Split from:
8
3