Closed Lysimachos closed 4 years ago
Ok fixed as soon as I made : "use_multiprocessing": False
What does this thing do?
When enabled, converting features is accelerated using multiprocessing on CPUs with multiple cores. Otherwise, feature conversion can take hours with large datasets.
Thank you ThilinaRajapakse for you reply.
There seems to be an issue with "use_multiprocessing": True when gpu-cuda is enable.
What are you running the code on? It's not an issue on my machine and I don't think anyone else has run into this issue either.
Yes yes there seems to be an issue with my set up.
I am using Pycharm remote interpreter on a machine with GeForce RTX 2080 with cuda 10.2 and AMD ryzen Threadripper 2950X 16-Core Processor.
Maybe it's related to the Pycharm remote interpreter. My setup is pretty similar to yours (RTX Titan and Ryzen 2700X). Unless there is an issue with the Threadripper series that I am not aware of. You could try setting process_count: 8
to see whether it makes a difference. From what I can remember, Threadripper has two separate processors on the same chip, right?
It is not the remote interpreter thing because I tried to run directly on the machine, resulting to the same problem. Everything worked as it should when used the "process_count" : 8. I also used "process_count": 16 and everything worked fine.
Thank you for your help and interest Thilina.
I suspect it has something to do with the Threadripper architecture and how the processes are distributed on the cores.
You are welcome!
I am going to do some digging into this. As soon as I have something new to add I will inform you.
Thanks again
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I have encountered the same infinite loop, upon train_model:
from simpletransformers.classification import ClassificationModel import pandas as pd test = ClassificationModel("distilbert","distilbert-base-cased") a = pd.DataFrame() a['text'] = ['a','b','c','d','e'] a['labels'] = [1,2,3,4,5] test.train_model(a)
`2020-11-10 21:05:35.646764: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll Some weights of the model checkpoint at distilbert-base-cased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']
It hangs at this point for a moment, before restarting with apparently two threads:
`2020-11-10 21:07:34.159618: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll 2020-11-10 21:07:42.104897: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll Some weights of the model checkpoint at distilbert-base-cased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']
This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-cased and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
I1110 21:08:46.064993 8640 classification_model.py:1073] Converting to features started. Cache is not used.
Traceback (most recent call last):
File "
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
I1110 21:08:47.658993 8640 internal.py:138] Internal process exited
I1110 21:08:50.152998 4352 classification_model.py:1073] Converting to features started. Cache is not used.
Traceback (most recent call last):
File "
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
I1110 21:08:51.215993 4352 internal.py:138] Internal process exited `
The above code repeats endlessly. I tried setting use_multiprocessing to False, to no avail. Could it be something to do with my cuda or torch versions?
The problem according to debugging is line 1083 in classifcation_model.py:
features = convert_examples_to_features(
examples,
args.max_seq_length,
tokenizer,
output_mode,
# XLNet has a CLS token at the end
cls_token_at_end=bool(args.model_type in ["xlnet"]),
cls_token=tokenizer.cls_token,
cls_token_segment_id=2 if args.model_type in ["xlnet"] else 0,
sep_token=tokenizer.sep_token,
# RoBERTa uses an extra separator b/w pairs of sentences,
# cf. github.com/pytorch/fairseq/commit/1684e166e3da03f5b600dbb7855cb98ddfcd0805
sep_token_extra=bool(args.model_type in ["roberta", "camembert", "xlmroberta", "longformer"]),
# PAD on the left for XLNet
pad_on_left=bool(args.model_type in ["xlnet"]),
pad_token=tokenizer.convert_tokens_to_ids([tokenizer.pad_token])[0],
pad_token_segment_id=4 if args.model_type in ["xlnet"] else 0,
process_count=process_count,
multi_label=multi_label,
silent=args.silent or silent,
use_multiprocessing=args.use_multiprocessing,
sliding_window=args.sliding_window,
flatten=not evaluate,
stride=args.stride,
add_prefix_space=bool(args.model_type in ["roberta", "camembert", "xlmroberta", "longformer"]),
# avoid padding in case of single example/online inferencing to decrease execution time
pad_to_max_length=bool(len(examples) > 1),
args=args,
)
This is likely a Windows issue. Multiprocessing and Pytorch don't play nice with Windows.
You could try this fix.
For example:
def run():
# Do everything here
if __name__ == '__main__':
run()
I'm not sure if that'll fix it though.
I've also seem to be stuck at "Converting to features started. Cache is not used" - using a dataset containing 500,000 entries for binary classification.
Converting to Features works using CPU on local machine but crashes afterwards - I'm assuming due to the heaving demand when training.
When using Google Colab (CUDA)m I get stuck at "Converting to features started. Cache is not used" with no progress after 8 hours.
This is still an issue when I tried to train a model with a total of ~100,000 entries. I'm pretty sure there are some processes crushed without warning as I got some linux core dump files.
Find solution for me:
As writed above- main problem in multithreading in inference mode
Solution- switch off multithreading, using args of ClassificationModel
model
cm_object = ClassificationModel()
# cm_object = torch.load("./model.pt") # or use pretrained- will worked fine
cm_object.args.use_multiprocessing = False
cm_object.args.use_multiprocessing_for_evaluation = False
cm_object.args.multiprocessing_chunksize = 1
cm_object.args.dataloader_num_workers = 1
I know- it is solution for consequences, not for main reason, but maybe for someone it'll be helpful
This is still an issue when I tried to train a model with a total of ~100,000 entries. I'm pretty sure there are some processes crushed without warning as I got some linux core dump files.
Same issue
This is still an issue when I tried to train a model with a total of ~100,000 entries. I'm pretty sure there are some processes crushed without warning as I got some linux core dump files.
I also encounter the same issue. The following two pictures show the situation I train the classification model with 1000 examples. picture 1: picture 2: It seems all right, but if I expand the dataset to 7 million, 1700 categories. In the first case (picture 1), the train file would be killed when convert to features. In the second case (picture 2), the train file would get stuck at a certain moment., and I will get some large core.xxx files. core files: How can I deal with this situation, looking forward to your reply. @ThilinaRajapakse
Found a solution for me : "use_multiprocessing_for_evaluation": True, "multiprocessing_chunksize": 5 (Take a number adapted to you)
Describe the bug I am trying to use Roberta for multi-label classification. I am facing problems when converting to features both in model.eval and model.predict. When using the model.predict in the following code ( as given from simpletransformers) it prints: "Converting to features started. Cache is not used 0%| | 0/1 [00:00<?, ?it/s]" and then does nothing. looks like it gets into an infinite loop or something.
To Reproduce from simpletransformers.classification import MultiLabelClassificationModel import pandas as pd train_data = [['Example sentence 1 for multilabel classification.', [1, 1, 1, 1, 0, 1]]] + [['This is another example sentence. ', [0, 1, 1, 0, 0, 0]]] train_df = pd.DataFrame(train_data, columns=['text', 'labels'])
eval_data = [['Example eval sentence for multilabel classification.', [1, 1, 1, 1, 0, 1]], ['Example eval senntence belonging to class 2', [0, 1, 1, 0, 0, 0]]] eval_df = pd.DataFrame(eval_data,columns=['text', 'labels']))
model = MultiLabelClassificationModel('roberta', 'roberta-base', num_labels=6, args={'reprocess_input_data': True, 'overwrite_output_dir': True, 'num_train_epochs': 5}) print(train_df.head())
model.train_model(train_df)
result, model_outputs, wrong_predictions = model.eval_model(eval_df) print(result) print(model_outputs)
predictions, raw_outputs = model.predict(['This thing is entirely different from the other thing. ']) print(predictions) print(raw_outputs)