0ssamaak0 / CLIPPyX

AI Powered Image search tool offers content-based, text, and visual similarity system-wide search.
MIT License
196 stars 17 forks source link

Apply threshold & change number of retrieved images #2

Closed 0ssamaak0 closed 2 months ago

0ssamaak0 commented 5 months ago

currently, any search query shows by default the top 5 matches regardless of the similarity score.

Implement a thresholding mechanism to filter out similarity scores below a certain value, ensuring that only relevant results are displayed, Noting that each case (and each model) might have different optimal threshold we need t explore them

# server.py
def search_clip_text(text, image_collection):
...
    # change this 5 to another number (maybe add it in `config.yaml`)
    results = image_collection.query(text_embedding, n_results=5)
    # apply threshold (differes for each task & each model)

def search_clip_image(image_path, image_collection, get_self=False):
#same
def search_embed_text(text, text_collection):
#same 
MahmoudAshraf97 commented 5 months ago

maybe use a strategy similar to top_p or min_p where the number of results depends on the similarity score of the most similar result, here's a quick explanation: https://www.reddit.com/r/LocalLLaMA/comments/17vonjo/your_settings_are_probably_hurting_your_model_why/ great work btw!

0ssamaak0 commented 5 months ago

Very nice idea! I haven't thought about applying LLMs sampling methods. I will check this.

Thank you 😁😁