I'm encountering a memory-related challenge while using Flask, Gunicorn, and the SimpleTransformers library for document classification. Specifically, I've trained a model to identify resumes, and it works well initially. However, I've noticed that as the number of requests increases, the memory consumption of the application gradually rises. After around 100 requests, the memory usage remains elevated and doesn't go down.
Problem Details:
Upon investigating the issue with a memory profiler, I've identified that a few lines within the classification_model.py file of the SimpleTransformers library are causing significant memory consumption spikes. Here are the key lines and their respective memory increment:
Line 2199: A memory spike of 11.5 MiB occurs during outputs = self._calculate_loss(model, inputs, loss_fct=self.loss_fct).
Line 2181: An increase of 0.5 MiB is observed during the loop for i, batch in enumerate(tqdm(eval_dataloader, disable=args.silent)).
Line 2182: There's a memory bump of 0.4 MiB when executing model.eval().
Line 2089: Memory usage grows by 0.9 MiB during eval_dataset = self.load_and_cache_examples(...).
Line 2049: An additional 0.4 MiB is used while running self._move_model_to_device().
Attempted Solutions:
To tackle this, I disabled multiprocessing as a suggested remedy. While this reduced the memory leak, it didn't fully resolve the issue. Some incremental memory consumption still occurs after a certain number of requests.
Note:
Our testing environment is a VM instance, while production uses Kubernetes. The memory problem is prevalent in both setups.
I'm keen on gathering insights and solutions from the community to address this memory concern. Your input and suggestions would be immensely helpful.
Issue Summary: Hello everyone,
I'm encountering a memory-related challenge while using Flask, Gunicorn, and the SimpleTransformers library for document classification. Specifically, I've trained a model to identify resumes, and it works well initially. However, I've noticed that as the number of requests increases, the memory consumption of the application gradually rises. After around 100 requests, the memory usage remains elevated and doesn't go down.
Problem Details: Upon investigating the issue with a memory profiler, I've identified that a few lines within the classification_model.py file of the SimpleTransformers library are causing significant memory consumption spikes. Here are the key lines and their respective memory increment:
Line 2199: A memory spike of 11.5 MiB occurs during outputs = self._calculate_loss(model, inputs, loss_fct=self.loss_fct). Line 2181: An increase of 0.5 MiB is observed during the loop for i, batch in enumerate(tqdm(eval_dataloader, disable=args.silent)). Line 2182: There's a memory bump of 0.4 MiB when executing model.eval(). Line 2089: Memory usage grows by 0.9 MiB during eval_dataset = self.load_and_cache_examples(...). Line 2049: An additional 0.4 MiB is used while running self._move_model_to_device().
Attempted Solutions: To tackle this, I disabled multiprocessing as a suggested remedy. While this reduced the memory leak, it didn't fully resolve the issue. Some incremental memory consumption still occurs after a certain number of requests.
Code I am using
Environment Details:
OS: Ubuntu 20 System: 16GB RAM, 8-core CPU Libraries: simpletransformers==0.63.9, transformers==4.21.3, torch==1.13.1 (CPU)
Note: Our testing environment is a VM instance, while production uses Kubernetes. The memory problem is prevalent in both setups. I'm keen on gathering insights and solutions from the community to address this memory concern. Your input and suggestions would be immensely helpful.