ThilinaRajapakse / simpletransformers

Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conversational AI
https://simpletransformers.ai/
Apache License 2.0
4.11k stars 727 forks source link

Memory Leak Issue #1544

Open sanwark11 opened 1 year ago

sanwark11 commented 1 year ago

Issue Summary: Hello everyone,

I'm encountering a memory-related challenge while using Flask, Gunicorn, and the SimpleTransformers library for document classification. Specifically, I've trained a model to identify resumes, and it works well initially. However, I've noticed that as the number of requests increases, the memory consumption of the application gradually rises. After around 100 requests, the memory usage remains elevated and doesn't go down.

Problem Details: Upon investigating the issue with a memory profiler, I've identified that a few lines within the classification_model.py file of the SimpleTransformers library are causing significant memory consumption spikes. Here are the key lines and their respective memory increment:

Line 2199: A memory spike of 11.5 MiB occurs during outputs = self._calculate_loss(model, inputs, loss_fct=self.loss_fct). Line 2181: An increase of 0.5 MiB is observed during the loop for i, batch in enumerate(tqdm(eval_dataloader, disable=args.silent)). Line 2182: There's a memory bump of 0.4 MiB when executing model.eval(). Line 2089: Memory usage grows by 0.9 MiB during eval_dataset = self.load_and_cache_examples(...). Line 2049: An additional 0.4 MiB is used while running self._move_model_to_device().

Attempted Solutions: To tackle this, I disabled multiprocessing as a suggested remedy. While this reduced the memory leak, it didn't fully resolve the issue. Some incremental memory consumption still occurs after a certain number of requests.

Code I am using

args = {
    "use_multiprocessing": False,
    "use_multiprocessing_for_evaluation": False,
    "process_count": 1
}

trained_model = ClassificationModel(
    "roberta",
    model_path,
    num_labels=2,
    use_cuda=False,
    args=args
)
prediction, raw_outputs = trained_model.predict([text])

Environment Details:

OS: Ubuntu 20 System: 16GB RAM, 8-core CPU Libraries: simpletransformers==0.63.9, transformers==4.21.3, torch==1.13.1 (CPU)

Note: Our testing environment is a VM instance, while production uses Kubernetes. The memory problem is prevalent in both setups. I'm keen on gathering insights and solutions from the community to address this memory concern. Your input and suggestions would be immensely helpful.

sanwark11 commented 1 year ago

Please suggest some solution, I am facing this issue on producation