ageitgey / face_recognition

The world's simplest facial recognition api for Python and the command line
MIT License
52.89k stars 13.44k forks source link

Method to release dlib resources to manage GPU resources #899

Open pliablepixels opened 5 years ago

pliablepixels commented 5 years ago

(relates to #868 #722)

Description of problem

Dlib cleanly removes its memory consumption when necessary objects are deleted (or go out of scope). However, face_recognition instantiates module variables that get instantiated during import (here and here) which make it hard to delete for long running apps to be able to conserve GPU memory. The two biggest memory consumers are cnn_face_detection_model and face_encoder which keep persistent in memory and if we are implementing a web server similar to your example here, we run out of memory very quickly after a few API calls.

To solve this and manage memory, I have this current workaround implemented in my code:

import dlib
import face_recognition

# In my class where I use face_recognition

def clean_dlib(self):
        del face_recognition.api.cnn_face_detector 
        del face_recognition.api.face_encoder 

def init_dlib(self):
        face_recognition.api.cnn_face_detector = dlib.cnn_face_detection_model_v1(face_recognition.api.cnn_face_detection_model)
        face_recognition.api.face_encoder = dlib.face_recognition_model_v1(face_recognition.api.face_recognition_model)

I call self.init_dlib() before I use your methods and self.clean_dlib() right after, like so:

self.init_dlib()
face_locations = face_recognition.face_locations(...)
face_encodings = face_recognition.face_encodings(...)
<do the rest of face_comparison etc>
self.clean_dlib()

Obviously, this results in a speed decrease because the model and encodings get reloaded, but it lets me manage memory and not run out just after a few calls (I have a 1050Ti 4GB)

Ask

I was wondering if you have an alternate suggestion or, maybe, would consider a clean API to clear resources?

ageitgey commented 5 years ago

That is a good idea. I'll think about doing something in the future, but PRs are welcome if anyone else does it first. As a (lame) workaround, you could always import face_recognition inside a function to make it go out of scope when the function ends.

That being said, you shouldn't run out of GPU memory after a few calls unless something weird is happening. Maybe it is because of the way you are calling the library from inside a class and hanging on to a duplicate reference there or something? You don't need to embed the face recognition library inside a class.

pliablepixels commented 5 years ago

Hi, I am not really doing anything special here. The situation is also replicable in your web service example as #722 points out.

mohitwadhwa2 commented 3 years ago

Did you find any other solution regarding this memory issue?

epicchen commented 3 years ago

In order to debug, you could use app.run(threaded=False), the memory of GPU will be released.

flariut commented 3 years ago

hi, I'm having the same issue, and I find it quite annoying that the only fix available is this workaround. It's important for me to have a service running 24/7 without having problems with this, and would be logical to the implementation to not to leak memory in a continous operation without deserializing the model from disk again and again. Is there any updates on this? is this a dlib or a face_recognition problem? thanks in advance.

epicchen commented 2 years ago

@flariut , at the product server, you should use something like Nginx. It will manage multi-thread and release memory by itself.

Bah1996 commented 1 year ago

I have memory problems too

corrupted size vs. prev_size while consolidating malloc(): invalid next size (unsorted) free(): corrupted unsorted chunks What is the reason for these errors?

flariut commented 1 year ago

updating in my issue, for me it was resolved changing the way we manage threads in our application. apparently, this is a CUDA issue with threading, not a dlib or a face_recognition problem. if you work spawning and killing threads, even if you free and delete all references to it and it's objects, the CUDA backend leaks a bit of memory. the solution is to use a pool of worker threads. https://github.com/davisking/dlib/issues/1381#issuecomment-599146089