deephealthproject / eddl

European Distributed Deep Learning (EDDL) library. A general-purpose library initially developed to cover deep learning needs in healthcare use cases within the DeepHealth project.
https://deephealthproject.github.io/eddl/
MIT License
34 stars 10 forks source link

Segmentation fault in eddl.forward() with a LSTM layer #329

Closed thistlillo closed 2 years ago

thistlillo commented 2 years ago

There is another issue open on the PyEDDL section (https://github.com/deephealthproject/pyeddl/issues/73), but Simone Leo did notice it is the same with the C++ implementation, so I file it also here.

This is the original post:

Trying not to use a recurrent LSTM network for avoiding the issue in deserialization (issue #72) I am trying to train the model without a recurrent LSTM layer, using the "finest" training method. I get a segmentation fault in the forward() method: I am not sure whether it is a bug or a mistake in my code, that you may find right after the dev settings.

The segmentation fault occurs both on GPU and CPU.

Could you please take a look at the code?

Dev settings:

Python 3.8.6 | packaged by conda-forge | (default, Oct  7 2020, 19:08:05) 
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyeddl 
>>> import pyecvl
>>> print(pyeddl.__version__)
1.2.0
>>> print(pyecvl.__version__)
1.0.0
import numpy as np
import pyeddl.eddl as eddl
from pyeddl.tensor import Tensor
import pyeddl.eddl as eddl
import numpy as np

visual_dim = 10
semantic_dim = 10
lstm_size = 512
emb_size = 512
vs = 100
bs = 32

# --- 
# model
# visual
cnn_top_in = eddl.Input([visual_dim], name="in_visual_features")
visual_features = eddl.RandomUniform(eddl.Dense(cnn_top_in, cnn_top_in.output.shape[1], name="visual_features") )
alpha_v = eddl.Softmax(eddl.Dense(eddl.Tanh(visual_features), visual_features.output.shape[1]), name="alpha_v")  # missing sentence component
v_att = eddl.Mult(alpha_v, visual_features)
print(f"layer visual features: {visual_features.output.shape}")

# semantic features
cnn_out_in = eddl.Input([semantic_dim], name="in_semantic_features")
semantic_features = eddl.RandomUniform(eddl.Embedding(eddl.ReduceArgMax(cnn_out_in, [0]), cnn_out_in.output.shape[1], 1, emb_size, name="semantic_features"), -0.05, 0.05)
alpha_s = eddl.Softmax(eddl.Dense(eddl.Tanh(semantic_features), emb_size), name="alpha_s")  # missing sentence component cnn_out.output.shape[1]
s_att = eddl.Mult(alpha_s, semantic_features)
print(f"layer semantic features: {semantic_features.output.shape}")

# coattention (add alpha and multiplication)
hidden_lstm_in = eddl.Input([lstm_size], name="hidden_lstm_in")
features = eddl.Concat([v_att, s_att, hidden_lstm_in], name="co_att_in")
context = eddl.Dense(features, emb_size, name="co_attention")
print(f"layer coattention: {context.output.shape}")

# lstm
word_in = eddl.Input([vs], "word_emb_input")
states_input = eddl.States([2, lstm_size], name="lstm_states")

word_emb = eddl.ReduceArgMax(word_in, [0])
word_emb = eddl.RandomUniform(eddl.Embedding(word_emb, vs, 1, emb_size, mask_zeros=True, name="word_embeddings"), -0.05, +0.05)
#to_lstm = eddl.Concat([to_lstm, context])
lstm = eddl.LSTM([word_emb, context, states_input], lstm_size, mask_zeros=True, bidirectional=False, name="lstm")
lstm.isrecurrent = False

out_lstm = eddl.Softmax(eddl.Dense(lstm, vs, name="top_dense"), name="rnn_out")
rnn = eddl.Model([cnn_top_in, cnn_out_in, hidden_lstm_in, word_in, states_input], [out_lstm])

cs = eddl.CS_GPU(g=[1], mem="full_mem")
# cs = eddl.CS_CPU()

eddl.build(rnn, eddl.adam(), ["softmax_cross_entropy"], ["accuracy"], cs, init_weights=True)
eddl.summary(rnn)

# input data
vis = Tensor.randn([bs, 1, visual_dim])
sem = Tensor.randn([bs, 1, semantic_dim])
hidden = Tensor.randn([bs, 1, lstm_size])

text = Tensor.fromarray(np.random.randint(0, vs+1, size=(bs, 10, vs)))
T = Tensor.onehot(text, vs)
states = Tensor.zeros([bs, 2, lstm_size])

print(f"cnn.forward, visual: {vis.shape}")
print(f"cnn.forward, semantic: {vis.shape}")
print(f"text, shape: {text.shape}")
print(f"T, shape: {T.shape}")

# --- attention
lstm_states = eddl.States([2, lstm_size])
states_tensor = Tensor.zeros([bs, 2, lstm_size])
hidden_layer = Tensor.zeros( [bs, lstm_size])

eddl.set_mode(rnn, 1)
eddl.reset_loss(rnn)
eddl.zeroGrads(rnn)

# for j over the current word, from bos to eos
for j in range(T.shape[1]):
    batch_word = T.select([":", str(j), ":"]).squeeze()

    input_tensors = [vis.squeeze(), sem.squeeze(), hidden_layer, batch_word, states_tensor]

    layers = [cnn_top_in, cnn_out_in, hidden_lstm_in, word_in, states_input]
    print("forwarding these tensors:")
    for l, t in zip(layers, input_tensors):
        print(f"- layer {l.name} with shape {l.output.shape} : {t.shape} input batch")

    print("forwarding")
    eddl.forward(rnn, input_tensors)
    print("forwarded")

The output is:

layer visual features: [1, 10]
layer semantic features: [1, 512]
layer coattention: [1, 512]
Generating Random Table
CS with full memory setup
Building model
Selecting GPU device 0
EDDL is running on GPU device 0, Tesla T4
CuBlas initialized on GPU device 0, Tesla T4
CuRand initialized on GPU device 0, Tesla T4
CuDNN initialized on GPU device 0, Tesla T4
-------------------------------------------------------------------------------
model
-------------------------------------------------------------------------------
in_visual_features  |  (10)                =>   (10)                0         
visual_features     |  (10)                =>   (10)                110       
tanh1               |  (10)                =>   (10)                0         
dense1              |  (10)                =>   (10)                110       
alpha_v             |  (10)                =>   (10)                0         
mult_1              |  (10)                =>   (10)                0         
in_semantic_features|  (10)                =>   (10)                0         
reduction_argmax1   |  (10)                =>   (1)                 0         
semantic_features   |  (1)                 =>   (512)               5120      
tanh2               |  (512)               =>   (512)               0         
dense2              |  (512)               =>   (512)               262656    
alpha_s             |  (512)               =>   (512)               0         
mult_2              |  (512)               =>   (512)               0         
hidden_lstm_in      |  (512)               =>   (512)               0         
co_att_in           |  (10)                =>   (1034)              0         
co_attention        |  (1034)              =>   (512)               529920    
word_emb_input      |  (100)               =>   (100)               0         
reduction_argmax2   |  (100)               =>   (1)                 0         
word_embeddings     |  (1)                 =>   (512)               51200     
lstm_states         |  (2, 512)            =>   (2, 512)            0         
lstm                |  (512)               =>   (512)               2099200   
top_dense           |  (512)               =>   (100)               51300     
rnn_out             |  (100)               =>   (100)               0         
-------------------------------------------------------------------------------
Total params: 2999616
Trainable params: 2999616
Non-trainable params: 0

cnn.forward, visual: [32, 1, 10]
cnn.forward, semantic: [32, 1, 10]
text, shape: [32, 10, 100]
T, shape: [32, 10, 100]
forwarding these tensors:
- layer in_visual_features with shape [1, 10] : [32, 10] input batch
- layer in_semantic_features with shape [1, 10] : [32, 10] input batch
- layer hidden_lstm_in with shape [1, 512] : [32, 512] input batch
- layer word_emb_input with shape [1, 100] : [32, 100] input batch
- layer lstm_states with shape [1, 2, 512] : [32, 2, 512] input batch
forwarding
Segmentation fault (core dumped)
chavicoski commented 2 years ago

Hi,

You are not using the API correctly in this example. The LSTM accepts a vector of parent layers in the first argument, this vector can be of length one if you are providing the input data only, or it can be of length two if you provide the input data and a States layer. In this example, you are trying to pass three inputs ([word_emb, context, states_input]), which is not supported.

thistlillo commented 2 years ago

Thank you! Problem solved...

word_emb = eddl.ReduceArgMax(word_in, [0])
word_emb = eddl.RandomUniform(eddl.Embedding(word_emb, vs, 1, emb_size, mask_zeros=True, name="word_embeddings"), -0.05, +0.05)

to_lstm = eddl.Concat([word_emb, context])  # <------------ new line

lstm = eddl.LSTM([to_lstm, states_input], lstm_size, mask_zeros=True, bidirectional=False, name="lstm")
lstm.isrecurrent = False
chavicoski commented 2 years ago

Perfect!