A firehose of post embeddings, downstream of the bluesky firehose.
bluesky is a decentralized, open source social media network. one of the pieces of bluesky is a firehose, which is a websocket server that emits events for every post, like, and update on the network.
various apps might have the idea to generate embeddings via the openai api. however, at the rate that new posts are created on bluesky, it doesn't make sense to have a hundred different apps calling the openai api to generate embeddings for the same posts each second.
enter, the embedded-firehose
. it too is a firehose of all the events on bluesky, but it emits events like:
{
uri: string,
embedding: number[1536]
}
where the uri
is the same uri you'd see on an event from bluesky's firehose, and embedding
is an array of 1536 floats, exactly as you'd get from calling openAI's createEmbedding
endpoint
using the text-embedding-ada-002
model. see embedder.ts
for the actual api call.
embeddingfirehose.atproto.drewmca.dev
, where you can subscribe to events.handleEvent(e: EmbeddingEvent)
that updates your db with the embedding received from this firehose.config.ts
: .env
config wrapper, cp .env.example .env
for a quickstartembedded-firehose-server.ts
: main class, extends the FirehoseSubscriptionBase
found in the bluesky-social/feed-generator
repo. Maps incoming created posts from the bluesky firehose to EmbeddedPost
s.counting-ws-server.ts
: the websocket server that clients can connect to, and emits the EmbeddedPost
eventsembedder.ts
: the service that actually calls the openai API