slower prediction on CPU (elmo)

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOS, 8GB RAM
TensorFlow/Keras version: 1.11/2.2.4
Python version: 3.6.7

Describe the problem

I have trained Entity Extraction model using Elmo embeddings (as described in elmo_example.py). Now I am trying to make prediction on new sentences. Elmo is taking lot of time to transform sentences into vectors.

Elmo is taking 4-5 sec to transform 15 sentence into vector. Average sentence length 15-20 words.

How can I make it faster, can I use GPU instead of using CPU for fast prediction?

Source code / logs

I have made some changes in ElmoTransformer class but it is giving me error, here is the code

        character_ids = batch_to_ids(X)
        if(cuda_device>=0):
            character_ids = character_ids.cuda(device=cuda_device)
        # start = time()
        elmo_embeddings = self._elmo(character_ids)['elmo_representations'][1]
        # print("Elmo {}".format(time()-start))
        elmo_embeddings = elmo_embeddings.detach().numpy()
        features = [word_ids, char_ids, elmo_embeddings]

Here is the error I am getting when cuda_device==0 and I am using GPU: expected object of backend cpu but got backend cuda It would be great if someone will help me. Thanks

Hironsan / anago