QuivrHQ / quivr

Open-source RAG Framework for building GenAI Second Brains 🧠 Build productivity assistant (RAG) ⚡️🤖 Chat with your docs (PDF, CSV, ...) & apps using Langchain, GPT 3.5 / 4 turbo, Private, Anthropic, VertexAI, Ollama, LLMs, Groq that you can share with users ! Efficient retrieval augmented generation framework
https://quivr.com
Other
36.32k stars 3.54k forks source link

fix: db migrations #3054

Closed AmineDiro closed 1 month ago

AmineDiro commented 1 month ago

Description

To test

For testing the migrations:

  1. Checkout the main brain and run an full : supbase db reset
  2. Dump and load a bigger dataset (the preview)
    
    -- 1. Run Dump
    -- pg_dump --data-only postgresql://XXXX -F d -f /mnt/ssd/quivr-data-prod

-- 2. Restore -- pg_restore -U postgres -h localhost -p 54322 -d postgres /mnt/ssd/quivr-data --clean

-- 3. Backfill brains_vectors from pg_dump INSERT INTO brains_vectors (brain_id,file_sha1,vector_id) SELECT '40ba47d7-51b2-4b2a-9247-89e29619efb0'::uuid AS brain_id, 'd23234j234' as file_sha1, id as vector_id FROM vectors; LIMIT 1000;

-- 4. Backfill knowledge from pg_dump INSERT INTO knowledge(file_name,brain_id,extension) WITH files as ( SELECT DISTINCT vectors.metadata->>'file_name' as file_name, brains_vectors.brain_id as brain_id FROM vectors, brains_vectors WHERE brains_vectors.vector_id = vectors.id ) SELECT file_name, brain_id, split_part(file_name, '.', array_length(string_to_array(file_name, '.'), 1)) AS extension from files;


3. Checkout this branch and run `supabase db push --local`

Check that nothing break
vercel[bot] commented 1 month ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
quivrapp ✅ Ready (Inspect) Visit Preview 💬 Add feedback Aug 22, 2024 1:39pm
AmineDiro commented 1 month ago

Run on local dump of ~5M vectors:

image