Error in cupy when using GPU for training a text classifier: "cannot overload functions distinguished by return type alone"

I tried to run a modified version of the example given in Adding a text classifier to a spaCy model with a GPU to speed up training. The original source code and mine differ as follows:

Line 1: #!/usr/bin/env python3 instead of #!/usr/bin/env python
Line 12: Remove from __future__ import unicode_literals, print_function
Line 58: optimizer = nlp.begin_training(device=0) instead of optimizer = nlp.begin_training()

Please see the last section "Modified script" in this issue for the modified script.

For the output of the modified script see the section "Output of the modified script" below. Thanks for your help!

Your Environment

spaCy version: 2.0.5
Platform: Linux-4.4.0-1022-aws-x86_64-with-debian-stretch-sid
Python version: 3.6.3
CuPy version: 2.2.0

Output of nvcc --version:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

Output of the modified script

Created blank 'en' model
Loading IMDB data...
Downloading data from http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
84000768/84125825 [============================>.] - ETA: 0sUntaring file...
Using 2000 examples (1600 training, 400 evaluation)
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/mh-nlp-experiments/lib/python3.6/site-packages/cupy/cuda/compiler.py", line 229, in compile
    nvrtc.compileProgram(self.ptr, options)
  File "cupy/cuda/nvrtc.pyx", line 98, in cupy.cuda.nvrtc.compileProgram
  File "cupy/cuda/nvrtc.pyx", line 108, in cupy.cuda.nvrtc.compileProgram
  File "cupy/cuda/nvrtc.pyx", line 53, in cupy.cuda.nvrtc.check_status
cupy.cuda.nvrtc.NVRTCError: NVRTC_ERROR_COMPILATION (6)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./train_textcat_cupy_error.py", line 131, in <module>
    plac.call(main)
  File "/home/ubuntu/anaconda3/envs/mh-nlp-experiments/lib/python3.6/site-packages/plac_core.py", line 328, in call
    cmd, result = parser.consume(arglist)
  File "/home/ubuntu/anaconda3/envs/mh-nlp-experiments/lib/python3.6/site-packages/plac_core.py", line 207, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "./train_textcat_cupy_error.py", line 56, in main
    optimizer = nlp.begin_training(device=0)
  File "/home/ubuntu/anaconda3/envs/mh-nlp-experiments/lib/python3.6/site-packages/spacy/language.py", line 456, in begin_training
    sgd=self._optimizer)
  File "pipeline.pyx", line 859, in spacy.pipeline.TextCategorizer.begin_training
  File "pipeline.pyx", line 778, in spacy.pipeline.TextCategorizer.Model
  File "/home/ubuntu/anaconda3/envs/mh-nlp-experiments/lib/python3.6/site-packages/spacy/_ml.py", line 527, in build_text_classifier
    >> zero_init(Affine(nr_class, width, drop_factor=0.0))
  File "/home/ubuntu/anaconda3/envs/mh-nlp-experiments/lib/python3.6/site-packages/thinc/check.py", line 127, in checker
    return wrapped(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/mh-nlp-experiments/lib/python3.6/site-packages/thinc/neural/_classes/model.py", line 274, in __pow__
    return self._operators['**'](self, other)
  File "/home/ubuntu/anaconda3/envs/mh-nlp-experiments/lib/python3.6/site-packages/thinc/api.py", line 162, in clone
    layers.append(copy.deepcopy(orig))
  File "/home/ubuntu/anaconda3/envs/mh-nlp-experiments/lib/python3.6/copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/ubuntu/anaconda3/envs/mh-nlp-experiments/lib/python3.6/copy.py", line 280, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/ubuntu/anaconda3/envs/mh-nlp-experiments/lib/python3.6/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/home/ubuntu/anaconda3/envs/mh-nlp-experiments/lib/python3.6/copy.py", line 240, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/ubuntu/anaconda3/envs/mh-nlp-experiments/lib/python3.6/copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/ubuntu/anaconda3/envs/mh-nlp-experiments/lib/python3.6/copy.py", line 280, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/ubuntu/anaconda3/envs/mh-nlp-experiments/lib/python3.6/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/home/ubuntu/anaconda3/envs/mh-nlp-experiments/lib/python3.6/copy.py", line 240, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/ubuntu/anaconda3/envs/mh-nlp-experiments/lib/python3.6/copy.py", line 161, in deepcopy
    y = copier(memo)
  File "cupy/core/core.pyx", line 1351, in cupy.core.core.ndarray.__deepcopy__
  File "cupy/core/core.pyx", line 1352, in cupy.core.core.ndarray.__deepcopy__
  File "cupy/core/core.pyx", line 377, in cupy.core.core.ndarray.copy
  File "cupy/core/core.pyx", line 274, in cupy.core.core.ndarray.astype
  File "cupy/core/core.pyx", line 347, in cupy.core.core.ndarray.astype
  File "cupy/core/elementwise.pxi", line 823, in cupy.core.core.ufunc.__call__
  File "cupy/util.pyx", line 39, in cupy.util.memoize.decorator.ret
  File "cupy/core/elementwise.pxi", line 622, in cupy.core.core._get_ufunc_kernel
  File "cupy/core/elementwise.pxi", line 33, in cupy.core.core._get_simple_elementwise_kernel
  File "cupy/core/carray.pxi", line 170, in cupy.core.core.compile_with_cache
  File "/home/ubuntu/anaconda3/envs/mh-nlp-experiments/lib/python3.6/site-packages/cupy/cuda/compiler.py", line 152, in compile_with_cache
    ptx = compile_using_nvrtc(source, options, arch)
  File "/home/ubuntu/anaconda3/envs/mh-nlp-experiments/lib/python3.6/site-packages/cupy/cuda/compiler.py", line 70, in compile_using_nvrtc
    ptx = prog.compile(options)
  File "/home/ubuntu/anaconda3/envs/mh-nlp-experiments/lib/python3.6/site-packages/cupy/cuda/compiler.py", line 233, in compile
    raise CompileException(log, self.src, self.name, options)
cupy.cuda.compiler.CompileException: /usr/local/cuda/include/cuda_fp16.h(133): error: cannot overload functions distinguished by return type alone

/usr/local/cuda/include/cuda_fp16.hpp(648): error: cannot overload functions distinguished by return type alone

/home/ubuntu/anaconda3/envs/mh-nlp-experiments/lib/python3.6/site-packages/cupy/core/include/cupy/carray.cuh(394): warning: statement is unreachable
          detected during instantiation of "void CIndexer<_ndim>::set(ptrdiff_t) [with _ndim=1]" 
/tmp/tmpzu9hbau7/kern.cu(10): here

2 errors detected in the compilation of "/tmp/tmpzu9hbau7/kern.cu".

Modified script

#!/usr/bin/env python3
# coding: utf8
"""Train a multi-label convolutional neural network text classifier on the
IMDB dataset, using the TextCategorizer component. The dataset will be loaded
automatically via Thinc's built-in dataset loader. The model is added to
spacy.pipeline, and predictions are available via `doc.cats`. For more details,
see the documentation:
* Training: https://spacy.io/usage/training
Compatible with: spaCy v2.0.0+
"""
import plac
import random
from pathlib import Path
import thinc.extra.datasets

import spacy
from spacy.util import minibatch, compounding

@plac.annotations(
    model=("Model name. Defaults to blank 'en' model.", "option", "m", str),
    output_dir=("Optional output directory", "option", "o", Path),
    n_texts=("Number of texts to train from", "option", "t", int),
    n_iter=("Number of training iterations", "option", "n", int))
def main(model=None, output_dir=None, n_iter=20, n_texts=2000):
    if model is not None:
        nlp = spacy.load(model)  # load existing spaCy model
        print("Loaded model '%s'" % model)
    else:
        nlp = spacy.blank('en')  # create blank Language class
        print("Created blank 'en' model")

    # add the text classifier to the pipeline if it doesn't exist
    # nlp.create_pipe works for built-ins that are registered with spaCy
    if 'textcat' not in nlp.pipe_names:
        textcat = nlp.create_pipe('textcat')
        nlp.add_pipe(textcat, last=True)
    # otherwise, get it, so we can add labels to it
    else:
        textcat = nlp.get_pipe('textcat')

    # add label to text classifier
    textcat.add_label('POSITIVE')

    # load the IMDB dataset
    print("Loading IMDB data...")
    (train_texts, train_cats), (dev_texts, dev_cats) = load_data(limit=n_texts)
    print("Using {} examples ({} training, {} evaluation)"
          .format(n_texts, len(train_texts), len(dev_texts)))
    train_data = list(zip(train_texts,
                          [{'cats': cats} for cats in train_cats]))

    # get names of other pipes to disable them during training
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'textcat']
    with nlp.disable_pipes(*other_pipes):  # only train textcat
        optimizer = nlp.begin_training(device=0)
        print("Training the model...")
        print('{:^5}\t{:^5}\t{:^5}\t{:^5}'.format('LOSS', 'P', 'R', 'F'))
        for i in range(n_iter):
            losses = {}
            # batch up the examples using spaCy's minibatch
            batches = minibatch(train_data, size=compounding(4., 32., 1.001))
            for batch in batches:
                texts, annotations = zip(*batch)
                nlp.update(texts, annotations, sgd=optimizer, drop=0.2,
                           losses=losses)
            with textcat.model.use_params(optimizer.averages):
                # evaluate on the dev data split off in load_data()
                scores = evaluate(nlp.tokenizer, textcat, dev_texts, dev_cats)
            print('{0:.3f}\t{1:.3f}\t{2:.3f}\t{3:.3f}'  # print a simple table
                  .format(losses['textcat'], scores['textcat_p'],
                          scores['textcat_r'], scores['textcat_f']))

    # test the trained model
    test_text = "This movie sucked"
    doc = nlp(test_text)
    print(test_text, doc.cats)

    if output_dir is not None:
        output_dir = Path(output_dir)
        if not output_dir.exists():
            output_dir.mkdir()
        nlp.to_disk(output_dir)
        print("Saved model to", output_dir)

        # test the saved model
        print("Loading from", output_dir)
        nlp2 = spacy.load(output_dir)
        doc2 = nlp2(test_text)
        print(test_text, doc2.cats)

def load_data(limit=0, split=0.8):
    """Load data from the IMDB dataset."""
    # Partition off part of the train data for evaluation
    train_data, _ = thinc.extra.datasets.imdb()
    random.shuffle(train_data)
    train_data = train_data[-limit:]
    texts, labels = zip(*train_data)
    cats = [{'POSITIVE': bool(y)} for y in labels]
    split = int(len(train_data) * split)
    return (texts[:split], cats[:split]), (texts[split:], cats[split:])

def evaluate(tokenizer, textcat, texts, cats):
    docs = (tokenizer(text) for text in texts)
    tp = 1e-8  # True positives
    fp = 1e-8  # False positives
    fn = 1e-8  # False negatives
    tn = 1e-8  # True negatives
    for i, doc in enumerate(textcat.pipe(docs)):
        gold = cats[i]
        for label, score in doc.cats.items():
            if label not in gold:
                continue
            if score >= 0.5 and gold[label] >= 0.5:
                tp += 1.
            elif score >= 0.5 and gold[label] < 0.5:
                fp += 1.
            elif score < 0.5 and gold[label] < 0.5:
                tn += 1
            elif score < 0.5 and gold[label] >= 0.5:
                fn += 1
    precision = tp / (tp + fp)
    recall = tp / (tp + fn)
    f_score = 2 * (precision * recall) / (precision + recall)
    return {'textcat_p': precision, 'textcat_r': recall, 'textcat_f': f_score}

if __name__ == '__main__':
    plac.call(main)

explosion / spaCy

Error in cupy when using GPU for training a text classifier: "cannot overload functions distinguished by return type alone" #1748

Your Environment

Output of the modified script

Modified script