Teradata / jupyter-demos

16 stars 19 forks source link

Chat with Docs - Use IVSM #738

Open chetan-hirapara opened 1 day ago

chetan-hirapara commented 1 day ago

New changes:

  1. Use multiple PDF files from insurance domain for chat with pdf
  2. change title as suggested in mail. - Teradata Enterprise Vector Store : vectorizing PDFs
  3. For chunking of pdf text, can you do in-db STO with python.
  4. Use HF models for create embeddings via BYOM approach (parallel CPU inferencing)
  5. Use 3rd party LLM (OpenAI/Bedrock/Gemini) for final answer
  6. You will have the use HF model also for question --> embeddings
  7. Also make some visualization (embedding to 2D) to show the selected chunk based on questions . I think scatter plot could be good which shows all chunks, question, and selected chunk
  8. Store PDFs in object store or Vantage Table (pointing to object store)
  9. No needs to add chat UI, Create pre-defined questions in a dropdown, and it can answer based on question selected.