Open JustinhSE opened 1 day ago
Since we will be moving to a larger dataset, pickle files won’t be good enough for the clusters as pickle files would load the entire file, reducing efficiency. We might want to consider Vector Databases. Use Case: • Ideal for applications that require similarity searches, such as those involving natural language processing (NLP), recommendation systems, or any task where semantic similarity is important. Advantages: • Efficient retrieval of high-dimensional data. • Optimized for handling large volumes of vector data with fast query performance. • Scales horizontally by adding more servers to a cluster, which is beneficial for large datasets. Examples: Pinecone, Milvus
open to ideas but wanted to drop this here.
Overview
The versify app is experiencing significant delays or failures when fetching k-means cluster results from our Python backend API. This issue is causing poor user experience and needs to be addressed urgently. We need to investigate the entire pipeline from the frontend request to the backend processing and response to identify and resolve bottlenecks.
Tasks
Analyze Frontend API Call Implementation
Backend API Performance Analysis
Caching Strategy (up for discussion)