RAG AI - TechDocs embeddings documentation

esttorhe commented 2 months ago

We are adopting the RAG AI plugin in our organization. We managed to configure it to OpenAI (not without issues, docs are not particularly clear) and after some searching through the code we discovered that the supported source for tech docs was tech-docs.

Unfortunately we have not managed to figure out how to properly make the plugin consume/generate the embeddings for tech-docs.

When calling the API for embeddings (for example for the catalog) we receive confirmation of the number of embeddings generated, but when calling it for tech-docs we get

{"message":"Only catalog, tech-docs are supported as AI assistant query sources for now."}

Feature Suggestion

Better documentation showing integration of the different options and proper examples. Some docs are scattered through the READMEs but are not concise or do not "marry" well when building a complete example.

Possible Implementation

Provide an example configuration repository for the different setups (or perhaps it is all there and I just couldn't find it, in which case an index for people like me who struggle to find stuff 😅 )

Context

brja commented 2 months ago

@esttorhe regarding ...We managed to configure it to OpenAI (not without issues, docs are not particularly clear)... would you mind sharing your code to get it running ?

esttorhe commented 2 months ago

@brja here's what we have The OPENAI_API_KEY we set as an env var

on our app-config.yml

ai:
  supportedSources: ['catalog', 'tech-docs']

  embeddings:

    # OpenAI Embeddings configuration
    openai:
      batchSize: 512
      embeddingsDimensions: 3072

in the packages/app/src/App.tsx

…
import { RagModal } from '@roadiehq/rag-ai';
…

…
const App = () => (
  <AppProvider>
    <AnalyticsContextProvider>
      <AlertDisplay />
      <OAuthRequestDialog />
      <AppRouter>
        <RagModal />
        <Root>{routes}</Root>
      </AppRouter>
    </AnalyticsContextProvider>

In the packages/app/src/apis.ts added this:

  createApiFactory({
    api: ragAiApiRef,
    deps: {
      configApi: configApiRef,
      discoveryApi: discoveryApiRef,
      fetchApi: fetchApiRef,
      identityApi: identityApiRef,
    },
    factory: ({ discoveryApi, fetchApi, configApi, identityApi }) => {
      return new RoadieRagAiClient({
        discoveryApi,
        fetchApi,
        configApi,
        identityApi,
      });
    },
  }),

packages/backend/src/index.ts (add this line anywhere)

…
backend.add(legacyPlugin('rag-ai', import('./plugins/ai')));
…

Introduced this file to packages/backend/src/plugins/ai.ts

import { createApiRoutes as initializeRagAiBackend } from '@roadiehq/rag-ai-backend';
import { PluginEnvironment } from '../types';
import { initializeOpenAiEmbeddings } from '@roadiehq/rag-ai-backend-embeddings-openai';
import { createRoadiePgVectorStore } from '@roadiehq/rag-ai-storage-pgvector';
import { createDefaultRetrievalPipeline } from '@roadiehq/rag-ai-backend-retrieval-augmenter';
import { OpenAI } from '@langchain/openai';
import { CatalogClient } from '@backstage/catalog-client/dist';

export default async function createPlugin(env: PluginEnvironment) {
  const catalogApi = new CatalogClient({
    discoveryApi: env.discovery,
  });

  const database = env.database;
  const config = env.config;
  const logger = env.logger;
  const discovery = env.discovery;
  const tokenManager = env.tokenManager;
  const vectorStore = await createRoadiePgVectorStore({ logger, database, config });

  const augmentationIndexer = await initializeOpenAiEmbeddings({
    logger,
    catalogApi,
    vectorStore,
    discovery,
    config,
    tokenManager,
  });

  const model = new OpenAI();
  const ragAi = await initializeRagAiBackend({
    logger,
    augmentationIndexer,
    retrievalPipeline: createDefaultRetrievalPipeline({
      discovery,
      logger,
      vectorStore: augmentationIndexer.vectorStore,
      tokenManager,
    }),
    model,
    config,
    tokenManager,
  });

  return ragAi.router;
}

And had to make some dependencies upgrades and faffing around winston logger to make all versions compatible when passing it around

github-actions[bot] commented 1 week ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

esttorhe commented 1 week ago

considering that no one from roadie ever commented on the issue this is as good as closed

we moved on from the plugin as we never got any value (because it was never fully deployed correctly due to this issue)

Unfortunate but that's life in OSS

RoadieHQ / roadie-backstage-plugins