SWM-Thlee / linked-paper-langchain

paper ETL pipeline, Semantic Search service
0 stars 0 forks source link

논문 정보를 저장할 vector db를 선정한다. #10

Closed ljy2855 closed 1 month ago

ljy2855 commented 1 month ago

작업 목표

작업 내용

ljy2855 commented 1 month ago

논문의 정보를 임베딩한 벡터 함께 저장할 Vector DB를 선정한다.

기존 RDB도 vector field를 지원하여 검색이 가능하도록 지원하지만 검색 방식에 따라 다르다!

따라서 적합한 별도의 DB 탐색

선정 기준

비교

  Elastic Search OpenSearch MongoDB
bm25 지원 여부 ⭕️ ⭕️
AWS service (serverless) ⭕️
vector search ⭕️ ⭕️ ⭕️
vector indexing HNSW HNSW HNSW, IVFFlat
max vector dimension 4096 16,000 2,000
vector search method KNN KNN ANN

선정

초기 인프라 관리에 간편한 OpenSearch를 도입하는게 좋을듯

레퍼런스

https://docs.aws.amazon.com/documentdb/latest/developerguide/vector-search.html

https://www.mongodb.com/ko-kr/docs/atlas/atlas-vector-search/vector-search-overview/

https://www.elastic.co/search-labs/blog/elasticsearch-opensearch-vector-search-performance-comparison

https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-vector-search.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html

https://docs.haystack.deepset.ai/docs/opensearch-document-store