ThilinaRajapakse / pytorch-transformers-classification

Based on the Pytorch-Transformers library by HuggingFace. To be used as a starting point for employing Transformer models in text classification tasks. Contains code to easily train BERT, XLNet, RoBERTa, and XLM models for text classification.
Apache License 2.0
306 stars 97 forks source link

ValueError: Number of processes must be at least 1 #4

Closed Magpi007 closed 5 years ago

Magpi007 commented 5 years ago

Hi,

When training the model, I get this error:

Untitled

I am just running the code for the first time, I haven't checked it too much yet...

ThilinaRajapakse commented 5 years ago

I thought I fixed this, sorry.

In this function definition, change the default value of process count to 1 for Google Colab. (Colab has 1 vCPU if I remember correctly).

Edit: Couldn't reproduce the error when I ran the notebook. The Colab notebook specifies a process count of 2 when calling convert_examples_to_features()

Magpi007 commented 5 years ago

So this change is only for Colab? If I implement it in my local laptop, can I use the one that is in the repo?

ThilinaRajapakse commented 5 years ago

Yes, the local version will work fine. By default, the process_count is set to number of CPU cores available - 2. On a modern computer, you will certainly have more than 2 so it's fine. But for Colab, the number is 1 so that makes the process_count -1, which throws the error because at least 1 process is needed.

Edit: The cpu_count on Colab is two, and the notebook is configured to use 2 as the process_count.

ThilinaRajapakse commented 5 years ago

I ran the Colab notebook again, it works without issues. The above change is unnecessary.

Magpi007 commented 5 years ago

Mmm and what could it be? I just run it again and got the same error. These are the resources that I have allocated:

Untitled
ThilinaRajapakse commented 5 years ago

Try setting process_count to 1 in the call to convert_examples_to_features() inside the load_and_cache_examples() function.

features = convert_examples_to_features(examples, label_list, args['max_seq_length'], tokenizer, output_mode,
            cls_token_at_end=bool(args['model_type'] in ['xlnet']),            # xlnet has a cls token at the end
            cls_token=tokenizer.cls_token,
            sep_token=tokenizer.sep_token,
            cls_token_segment_id=2 if args['model_type'] in ['xlnet'] else 0,
            pad_on_left=bool(args['model_type'] in ['xlnet']),                 # pad on the left for xlnet
            pad_token_segment_id=4 if args['model_type'] in ['xlnet'] else 0,
            process_count=1)
Magpi007 commented 5 years ago

I got this error when I changed it:

Untitled

Anyway, let me review the code, because I have been disconnected last days from this, so I want to check that I have been following all the steps correctly.

ThilinaRajapakse commented 5 years ago

Are you using a local copy (local to your Google Drive, that is)? I think this bug was there in the original notebook, but it was fixed later. The function should look like this:

def convert_examples_to_features(examples, label_list, max_seq_length,
                                 tokenizer, output_mode,
                                 cls_token_at_end=False, pad_on_left=False,
                                 cls_token='[CLS]', sep_token='[SEP]', pad_token=0,
                                 sequence_a_segment_id=0, sequence_b_segment_id=1,
                                 cls_token_segment_id=1, pad_token_segment_id=0,
                                 mask_padding_with_zero=True,
                                 process_count=cpu_count() - 2):
    """ Loads a data file into a list of `InputBatch`s
        `cls_token_at_end` define the location of the CLS token:
            - False (Default, BERT/XLM pattern): [CLS] + A + [SEP] + B + [SEP]
            - True (XLNet/GPT pattern): A + [SEP] + B + [SEP] + [CLS]
        `cls_token_segment_id` define the segment id associated to the CLS token (0 for BERT, 2 for XLNet)
    """

    label_map = {label : i for i, label in enumerate(label_list)}

    examples = [(example, label_map, max_seq_length, tokenizer, output_mode, cls_token_at_end, cls_token, sep_token, cls_token_segment_id, pad_on_left, pad_token_segment_id) for example in examples]

    with Pool(process_count) as p:
        features = list(tqdm(p.imap(convert_example_to_feature, examples, chunksize=100), total=len(examples)))

    return features
Magpi007 commented 5 years ago

Yeah maybe is that. I know is better to fork to your repo so we have the updates/fixes instantly, but I like first to understand the code recreating it in my own notebook. I am using Colab linked to Google Drive. I will check that and I will let you know. Thanks.

ThilinaRajapakse commented 5 years ago

Understandable! Let me know how it goes.

Magpi007 commented 5 years ago

With fresh head is more easy to see clearly. There were two things that I changed and make it worked:

Maybe the first point was the one causing the error? Anyway sorry for my lapsus, I will keep iterating it and let you know if I see any suspicious bug.

ThilinaRajapakse commented 5 years ago

Weird. Neither of those things should be throwing a "number of processes" error as far as I can tell. That error comes from the multiprocessing used for converting examples to features. Oh well, we don't need to worry about it if it's working!

Magpi007 commented 5 years ago

I was facing again this problem with another iteration and I changed this:

process_count = cpu_count() - 2

for this

process_count = 1

in the function convert_examples_to_features of the utils.py file, and it worked. I am working on Colab. It makes sense to you?

ThilinaRajapakse commented 5 years ago

Yes, that would fix all multiprocessing related issues at the expense of not using multiprocessing at all. I think you can get away with setting it to 2 on Colab. Setting it to 2 should speed things up a bit but setting it to 1 will ensure that you won't get multiprocessing related errors.