When I run the exo command on mac and start the inferences using the completion REST API endpoint, the Python process seems to be increasingly using more and more memory.
I have put a delay of 10 seconds between each request but eventually the system crashes due to running out of memory.
/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Hi,
When I run the exo command on mac and start the inferences using the completion REST API endpoint, the Python process seems to be increasingly using more and more memory.
I have put a delay of 10 seconds between each request but eventually the system crashes due to running out of memory.
Using the latelt macOS on M3 Max amd M2 Ultra