Devasy23 / FaceRec

An advanced facial recognition system designed for real-time identification using deep learning models and optimized vector search. Features include face detection, embedding generation, and scalable deployment options.
Apache License 2.0
12 stars 1 forks source link

Utility Function for Vector Similarity Search #18

Closed devansh-shah-11 closed 3 months ago

devansh-shah-11 commented 6 months ago

Description We need a new utility function in Database.py that performs a vector similarity search. This function should take an embedding vector as input and return the most similar vectors from the MongoDB Atlas database using Euclidean distance as the similarity measure.

This utility function will be used by the recognise_face() endpoint to find the most similar face in the database.

Expected Behavior

The endpoint should take n as input from the user and return the top n most similar vectors from MongoDB Database

Benefits This feature will automate the finding of top n most similar vectors to the given face to help identify the employee

Tasks Explore the MongoDB vector search tutorial Write a function to return the most similar vectors

Checklist - [X] Modify `API/database.py` ✓ https://github.com/devansh-shah-11/FaceRec/commit/d6366ebfcc133c30f5e069c0508a89b52686ba57 [Edit](https://github.com/devansh-shah-11/FaceRec/edit/sweep/utility_function_for_vector_similarity_s_0cb05/API/database.py#L22-L22) - [X] Running GitHub Actions for `API/database.py` ✓ [Edit](https://github.com/devansh-shah-11/FaceRec/edit/sweep/utility_function_for_vector_similarity_s_0cb05/API/database.py#L22-L22) - [X] Modify `API/route.py` ✓ https://github.com/devansh-shah-11/FaceRec/commit/7b8ca4e13c930240c7aef7d25b09dd19d42e82df [Edit](https://github.com/devansh-shah-11/FaceRec/edit/sweep/utility_function_for_vector_similarity_s_0cb05/API/route.py#L176-L220) - [X] Running GitHub Actions for `API/route.py` ✓ [Edit](https://github.com/devansh-shah-11/FaceRec/edit/sweep/utility_function_for_vector_similarity_s_0cb05/API/route.py#L176-L220)
sweep-ai[bot] commented 6 months ago

🚀 Here's the PR! #20

See Sweep's progress at the progress dashboard!
Sweep Basic Tier: I'm using GPT-4. You have 5 GPT-4 tickets left for the month and 3 for the day. (tracking ID: 0b2bf4e2dc)

For more GPT-4 tickets, visit our payment portal. For a one week free trial, try Sweep Pro (unlimited GPT-4 tickets).
Install Sweep Configs: Pull Request

[!TIP] I can email you next time I complete a pull request if you set up your email here!


Actions (click)

GitHub Actions✓

Here are the GitHub Actions logs prior to making any changes:

Sandbox logs for 91e83d1
Checking API/database.py for syntax errors... ✅ API/database.py has no syntax errors! 1/1 ✓
Checking API/database.py for syntax errors...
✅ API/database.py has no syntax errors!

Sandbox passed on the latest main, so sandbox checks will be enabled for this issue.


Step 1: 🔎 Searching

I found the following snippets in your repository. I will now analyze these snippets and come up with a plan.

Some code snippets I think are relevant in decreasing order of relevance (click to expand). If some file is missing from here, you can mention the path in the ticket description. https://github.com/devansh-shah-11/FaceRec/blob/91e83d1e0629dfb50ad9baecd37d3e4982a29f76/API/database.py#L1-L23 https://github.com/devansh-shah-11/FaceRec/blob/91e83d1e0629dfb50ad9baecd37d3e4982a29f76/API/route.py#L152-L220

Step 2: ⌨️ Coding

--- 
+++ 
@@ -22,3 +22,31 @@

     def update_one(self, collection, query, update):
         return self.db[collection].update_one(query, update)
+    def find_similar_vectors(self, collection, embedding_vector, n):
+        """
+        Find the top n most similar vectors in the database to the given embedding_vector.
+        This method uses the Euclidean distance for similarity measure.
+
+        :param collection: The MongoDB collection to search within.
+        :param embedding_vector: The embedding vector to find similar vectors for.
+        :param n: The number of top similar vectors to return.
+        :return: The top n most similar vectors from the MongoDB database.
+        """
+        pipeline = [
+            {
+                "$addFields": {
+                    "distance": {
+                        "$sqrt": {
+                            "$reduce": {
+                                "input": {"$zip": {"inputs": ["$vector", embedding_vector]}},
+                                "initialValue": 0,
+                                "in": {"$add": ["$$value", {"$pow": [{"$subtract": ["$$this.0", "$$this.1"]}, 2]}]}
+                            }
+                        }
+                    }
+                }
+            },
+            {"$sort": {"distance": 1}},
+            {"$limit": n}
+        ]
+        return list(self.db[collection].aggregate(pipeline))

Ran GitHub Actions for d6366ebfcc133c30f5e069c0508a89b52686ba57:

--- 
+++ 
@@ -267,3 +267,23 @@
     client.find_one_and_delete(collection, {"EmployeeCode": EmployeeCode})

     return {"Message": "Successfully Deleted"}
+@router.post("/recognise_face")
+async def recognise_face(embedding: List[float], n: int):
+    """
+    Recognise a face by finding the most similar face embeddings in the database.
+
+    Args:
+        embedding (List[float]): The embedding vector of the face to be recognised.
+        n (int): The number of top similar vectors to return.
+
+    Returns:
+        dict: A dictionary containing the top n most similar face embeddings.
+
+    """
+    logging.info("Recognising face")
+    try:
+        similar_faces = client.find_similar_vectors(collection, embedding, n)
+        return {"similar_faces": similar_faces}
+    except Exception as e:
+        logging.error(f"Error recognising face: {str(e)}")
+        raise HTTPException(status_code=500, detail="Internal server error")

Ran GitHub Actions for 7b8ca4e13c930240c7aef7d25b09dd19d42e82df:


Step 3: 🔁 Code Review

I have finished reviewing the code for completeness. I did not find errors for sweep/utility_function_for_vector_similarity_s_0cb05.


🎉 Latest improvements to Sweep:
  • New dashboard launched for real-time tracking of Sweep issues, covering all stages from search to coding.
  • Integration of OpenAI's latest Assistant API for more efficient and reliable code planning and editing, improving speed by 3x.
  • Use the GitHub issues extension for creating Sweep issues directly from your editor.

💡 To recreate the pull request edit the issue title or description. To tweak the pull request, leave a comment on the pull request.Something wrong? Let us know.

This is an automated message generated by Sweep AI.