NekoApocalypse / PCNN_TensorFlow

Piece-wise CNN for relation extraction.
14 stars 8 forks source link

Trying to train the PCNN on my own data #6

Closed Islam-Kh closed 4 years ago

Islam-Kh commented 4 years ago

I'm trying to train the PCNN on my own data, I made the same structure and type of the data that the network has been trained on ... But I'm getting the following error:

TypeError: Expected binary or unicode string, got [0.07911841, 0.100111045, ...]

I'm using TensorFlow-1.15.2, and python3.6 For more details The error occurred while executing the following:

with sess.as_default():
initializer = tf.contrib.layers.xavier_initializer()
            with tf.compat.v1.variable_scope('model', reuse=None,
                                   initializer=initializer):
                m = network.PCNNMasked(is_training=True,
                                       word_embeddings=word_embedding,
                                       settings=settings)
            global_step = tf.Variable(0, name='global_step', trainable=False)
            optimizer = tf.train.AdamOptimizer(0.001)

And it is exactly been thrown at this line:

m = network.PCNNMasked(is_training=True,
                                       word_embeddings=word_embedding,
                                       settings=settings)

Because of the word_embeddings type, in more details, the code stops executing when it reaches this line in the PCNNMasked function inside the network.py:

word_embedding = tf.get_variable(initializer=word_embeddings,
                                         name='word_embedding')

And it is true that my word embedding is an array of float numbers. what should I do?

I tried several solutions like, specify the data type explicitly as np.float32, using tf.constant_initializer, but I still get the error, any suggested solutions that I could try?

Islam-Kh commented 4 years ago

The problem has been solved... The problem in the word_embeddings array was that it contained arrays of differing length. about only 10 words have a different embedding length, so by ignoring those words, everything worked fine...