SciPhi-AI / R2R

The most advanced Retrieval-Augmented Generation (RAG) system, containerized and RESTful
https://r2r-docs.sciphi.ai/
MIT License
3.65k stars 270 forks source link

Entity deduplication by description #1551

Closed shreyaspimpalgaonkar closed 2 weeks ago

shreyaspimpalgaonkar commented 2 weeks ago

[!IMPORTANT] Adds description-based entity deduplication using DBSCAN and prepares for LLM-based deduplication in the knowledge graph pipeline.

  • Behavior:
    • Adds kg_description_entity_deduplication() in deduplication.py for deduplication using description embeddings with DBSCAN clustering.
    • Adds placeholder kg_llm_entity_deduplication() in deduplication.py for future LLM-based deduplication.
    • Updates _run_logic() in deduplication.py to handle new deduplication types.
  • Database:
    • Adds extra_columns parameter to get_entities() in kg.py to fetch additional columns like description_embedding.
    • Updates get_entities() queries in kg.py to include extra_columns.
  • Enums:
    • Adds BY_DESCRIPTION and BY_LLM to KGEntityDeduplicationType in kg.py.
  • Misc:
    • Minor adjustments in deduplication_summary.py to support new deduplication logic.

This description was created by Ellipsis for d91906d06ef37cd8cb8a1ad7d374e5d25112066d. It will automatically update as commits are pushed.

vercel[bot] commented 2 weeks ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
yc_demo ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 2, 2024 0:49am
yc-demo ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 2, 2024 0:49am
1 Skipped Deployment | Name | Status | Preview | Comments | Updated (UTC) | | :--- | :----- | :------ | :------- | :------ | | **recommendation_platform** | ⬜️ Ignored ([Inspect](https://vercel.com/my-team-88dd52c0/recommendation_platform/5Wjb3ujEAKNXFaw19Nhr1CJstdWb)) | | | Nov 2, 2024 0:49am |