caikit / caikit-nlp

Apache License 2.0
12 stars 47 forks source link

Error when trying to run peft prompt tuned model in __del__ #307

Open anhuong opened 9 months ago

anhuong commented 9 months ago

Describe the bug

When loading a PeftPromptTuned model and then running inference on the model, this works successfully but there is an ignored exception that occurs at the end after running inference. This occurs in the __del__ function that runs at the end when trying to destroy the object.

What I run where the input is a model tuned with caikit-nlp prior and saved to a directory outside of caikit-nlp:

model = PeftPromptTuning.load(args.model)
inf = model.run("I do not like the train. It is slow and smelly.")

Output:

<function register_backend_type at 0x148ac4311820> is still in the BETA phase and subject to change!
Loading checkpoint shards: 100%|████████████████████████████████████████████████| 6/6 [02:41<00:00, 26.98s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
{
  "generated_text": "Tweet text : I do not like the train. It is slow and smelly. Label : no complaint",
  "generated_tokens": 23,
  "finish_reason": "EOS_TOKEN",
  "producer_id": {
    "name": "Peft generation",
    "version": "0.1.0"
  },
  "input_token_count": 21,
  "seed": null
}
Exception ignored in: <function PeftPromptTuning.__del__ at 0x148a17253af0>
Traceback (most recent call last):
  File "/dccstor/anhtest/caikit-nlp/caikit_nlp/modules/text_generation/peft_prompt_tuning.py", line 146, in __del__
TypeError: 'NoneType' object is not callable

The line specified is gc.collect() here

Platform

Please provide details about the environment you are using, including the following:

Expected behavior

No exception message should be printed. We can expect the error if there is nothing to delete. Add a try-catch for exception:

try:
   gc.collect()
except:
   do nothing
gkumbhat commented 9 months ago

thats a bit of weird since gc / collect should be there.. I wonder if this is coming from next line, i.e torch.cuda.empty_cache since when we are not running on GPUs, then this line might behave differently and we are only catching AttributeError and not TypeError 🤔