TharinduDR / Siamese-Recurrent-Architectures

Usage of Siamese Recurrent Neural network architectures for semantic textual similarity
22 stars 2 forks source link

MABIGRU - AttributeError: 'tuple' object has no attribute 'shape' #8

Open MinuteswithMetrics opened 5 years ago

MinuteswithMetrics commented 5 years ago

Thank you for sharing your code. I tried to see if I can achieve the same results but I ran into AttributeError: 'tuple' object has no attribute 'shape' error. I also tried using the Quora training set and ran into the same error as well.

I was also wondering if this datasets = [train_df, test_df] on line 15 in bigru_manhattan.py. Should it be sick_train and sick_test?

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-10-1d2d0f3fe90c> in <module>
     16 for i in range(0, 3):
     17     sims, trained_model, topic = run_experiment(sick_train_normlized, sick_test_normalized, ['sent_1', 'sent_2'], "sim",
---> 18                                                 benchmarks[i])
     19     sick_test_normalized['predicted_sim'] = pd.Series(sims).values
     20     pearson_correlation = scipy.stats.pearsonr(sims, sick_test_normalized['sim'])[0]

/floyd/home/utility/run_experiment.py in run_experiment(train_df, test_df, sent_cols, sim_col, benchmark)
      1 def run_experiment(train_df, test_df, sent_cols, sim_col, benchmark):
----> 2     sims, trained_model = benchmark[1](train_df, test_df, sent_cols, sim_col)
      3     return sims, trained_model, benchmark[0]

/floyd/home/nn/bigru_manhattan.py in run_bigru_benchmark(train_df, test_df, sent_cols, sim_col, validation_portion, n_hidden, embedding_dim, batch_size, n_epoch, optimizer, save_weights, load_weights, max_seq_length, model)
     52 
     53     # Embedded version of the inputs
---> 54     encoded_left = embedding_layer(left_input)
     55     encoded_right = embedding_layer(right_input)
     56 

/usr/local/lib/python3.6/site-packages/keras/engine/base_layer.py in __call__(self, inputs, **kwargs)
    434                 # Load weights that were specified at layer instantiation.
    435                 if self._initial_weights is not None:
--> 436                     self.set_weights(self._initial_weights)
    437 
    438             # Raise exceptions in case the input is not compatible

/usr/local/lib/python3.6/site-packages/keras/engine/base_layer.py in set_weights(self, weights)
   1051         param_values = K.batch_get_value(params)
   1052         for pv, p, w in zip(param_values, params, weights):
-> 1053             if pv.shape != w.shape:
   1054                 raise ValueError('Layer weight shape ' +
   1055                                  str(pv.shape) +

AttributeError: 'tuple' object has no attribute 'shape'
TharinduDR commented 5 years ago

Hi @MinuteswithMetrics ,

Thank you for running my code. Are you running the code as it is? Did you run it on SICK data set and get the same error?

MinuteswithMetrics commented 5 years ago

I am running the jupyter notebook as is. I even tried using my own dataset and I get the same error.

MinuteswithMetrics commented 5 years ago

The dataset I will be using is similar to BIOSSES.csv.

TharinduDR commented 5 years ago

Can you post all the version of the libraries you are using?

MinuteswithMetrics commented 5 years ago

Below are the libraries that I am using.

flair
scipy==1.1.0
matplotlib==2.2.3
pandas==0.23.4
Keras==2.2.4
Keras_Preprocessing==1.0.9
requests==2.21.1
numpy==1.15.1
tensorflow==1.13.0
gensim==3.8.0
scikit_learn==0.19.2

I notice your code is similar to Biomedical Semantic Similarity Estimation , but I can't pinpoint why the code is failing at that particular spot.

BruceLee66 commented 5 years ago

I met the same problem,so how to resolve it.

MinuteswithMetrics commented 5 years ago

@BruceLee66 I'm in the process of figuring out.

TharinduDR commented 5 years ago

sims, trained_model, topic = run_experiment(sick_train_normlized, sick_test_normalized, ['sent_1', 'sent_2'], "sim", benchmarks[i])

In here I provide 'sent_1', 'sent_2'. These are the column names in my dataset, what are the column names in your dataset and do you provide them correctly?

MinuteswithMetrics commented 5 years ago

Okay, I think I may figure it out. But I have to test it first and will take a few days.

MinuteswithMetrics commented 5 years ago

@TharinduDR My column names are the same.

Can you explain datasets = [train_df, test_df]?

I notice you have it here:

def run_bigru_benchmark(train_df, test_df, sent_cols, sim_col, validation_portion=0.1, n_hidden=100, embedding_dim=300,
                       batch_size=64, n_epoch=500, optimizer=None, save_weights=None, load_weights=None,
                        max_seq_length=None, model=None):

    datasets = [train_df, test_df]

But I didn't see train_df and test_df anywhere else.

TharinduDR commented 5 years ago

I hope even the similarity column name is same.

I make an array of datasets from train_df and test_df so that the operations in prepare_embeddings can be applied easily to both train_df and test_df.

BruceLee66 commented 5 years ago

I did not change anything about the train data or the test data,all of them were provided by the link in your code.

BruceLee66 commented 5 years ago

The format of the data is like @TharinduDR image and how did you fix it? @MinuteswithMetrics

MinuteswithMetrics commented 5 years ago

@BruceLee66 I didn't fix it. I'm still trying to figure out why we are getting the error.

TharinduDR commented 5 years ago

I am trying to reproduce the error, Will let you know soon

MinuteswithMetrics commented 5 years ago

@TharinduDR Thank you

MinuteswithMetrics commented 5 years ago

@TharinduDR How replacing datasets = [train_df, test_df] with concat([df_train, df_test])

MinuteswithMetrics commented 5 years ago

@TharinduDR Is this correct? In your embedding processing, on line 11 you have questions_cols = question_cols. But in def run_lstm_benchmark you have question_cols=sent_cols.

Shouldn't it be question_cols=['sent_1', 'sent_2']?

xie233 commented 4 years ago
def run_bigru_benchmark(train_df, test_df, sent_cols, sim_col, validation_portion=0.1, n_hidden=100, embedding_dim=300,
                        batch_size=64, n_epoch=500, optimizer=None, save_weights=None, load_weights=None,
                        max_seq_length=None, model=None):
    datasets = [train_df, test_df]
    embeddings = prepare_embeddings(datasets=datasets, question_cols=sent_cols, model=model)

I found the 'embeddings' is a tuple, but was passed on the Embedding Layer.

MinuteswithMetrics commented 4 years ago

@xie233,

Did you get the code to run?