non-recurrent LSTM cells with multiple GPUs

Consider this code:


import pyeddl.eddl as eddl
from pyeddl.tensor import Tensor, DEV_CPU, DEV_GPU

dev = DEV_GPU

in_ = eddl.Input([10])
lstate = eddl.States([2, 100])
lstm = eddl.LSTM([in_, lstate], 100, mask_zeros=True, bidirectional=False, name="lstm_cell")
lstm.isrecurrent = False
out = eddl.Dense(lstm, 5)

# GPU selection
cs = eddl.CS_GPU(g=[1,1,0,0], mem="full_mem")
# cs = CS_CPU(th=2, mem="full_mem")

model = eddl.Model([in_, lstate], [out])
eddl.build(model, eddl.adam(), ["mse"], ["accuracy"], cs, init_weights=True)

bs = 30
batch = Tensor.randn([bs, 10], dev=dev)
state_t = Tensor.zeros([bs, 2, 100], dev=dev)
model.forward([batch, state_t])
print('--->')
states = eddl.getStates(lstm)
print('<---')

When it uses 1 GPU, there are no errors. When I use 2 ore more, I get an error on states = eddl.getStates(lstm) (<--- is never printed).

For example, with 2 gpus:

--->
-------------------------------
class:         Tensor
ndim:          2
shape:         (15, 100)
strides:       (100, 1)
itemsize:      1500
contiguous:    1
order:         C
data pointer:  0x55c995866748
is shared:     0
type:          float (4 bytes)
device:        GPU (code = 1000)
-------------------------------
-------------------------------
class:         Tensor
ndim:          2
shape:         (30, 100)
strides:       (100, 1)
itemsize:      3000
contiguous:    1
order:         C
data pointer:  0x55c983f1eb18
is shared:     0
type:          float (4 bytes)
device:        CPU (code = 0)
-------------------------------
==================================================================
⚠️  Tensors with different size (Tensor::copy) ⚠️
==================================================================

With 3 GPUs:

--->
-------------------------------
class:         Tensor
ndim:          2
shape:         (10, 100)
strides:       (100, 1)
itemsize:      1000
contiguous:    1
order:         C
data pointer:  0x56192c51ba78
is shared:     0
type:          float (4 bytes)
device:        GPU (code = 1000)
-------------------------------
-------------------------------
class:         Tensor
ndim:          2
shape:         (30, 100)
strides:       (100, 1)
itemsize:      3000
contiguous:    1
order:         C
data pointer:  0x56191a7bab18
is shared:     0
type:          float (4 bytes)
device:        CPU (code = 0)
-------------------------------
==================================================================
⚠️  Tensors with different size (Tensor::copy) ⚠️
==================================================================

Is it an issue of the library or am I making a mistake?

deephealthproject / eddl

non-recurrent LSTM cells with multiple GPUs #330