apmoore1 / semeval

SemEval 2017 task 5 track 2 codebase.
GNU General Public License v3.0
17 stars 7 forks source link

Word2vec error #1

Closed JayaLekhrajani closed 6 years ago

JayaLekhrajani commented 6 years ago

Hi Andrew,

I am facing the following error when trying to cross_validate using early_lstm : 'Word2Vec' object has no attribute 'vocab'

Please let me know how to resolve it.

Also, I am facing the following error: TypeError: run() got an unexpected keyword argument 'clipvalue'

apmoore1 commented 6 years ago

Hi,

Sorry about these problems. Do you mind posting the code you wrote as I cannot replicate your problem sorry. Here is the code I wrote just to double check if anything was going wrong and the code below works if you add the finance data from SemEval 2017 task 5 sub task 2 headlines data to the following directory in the repository data/finance as the filename Headline_Trainingdata.json (NOTE to get the data you need to fill out a data release form that is next to the link to the data):

import semeval.helper as helper from semeval.lstms.EarlyStoppingLSTM import EarlyStoppingLSTM fin_word2vec_model = helper.fin_word_vector() early_lstm = EarlyStoppingLSTM(fin_word2vec_model) train_texts, train_sentiments, train_companies = helper.fin_data('train') early_res = early_lstm.cross_validate(train_texts, train_sentiments) print(early_res)

I had a quick look at the clipvalue problem. I think you might be using a newer version of Keras than the one in this repository as model.compile in keras no longer has the clipvalue argument but this has been moved to the optimisers which can be a keyword argument in compile. I will try and update the code to a newer version of Keras this week but this might not be a quick fix sorry. The best way to get around this problem would be to install the exact python requirements of this repository. You can do this with the following command: pip3 install -r requirements.txt

JayaLekhrajani commented 6 years ago

Hi Andrew,

Thanks for clearing my doubts by a very detailed and nice explanation. But I am still confused about how you obtained early_stopping_submission.json and tweeked_lstm_submission.json files? I couldn't find the code in run.py file.

apmoore1 commented 6 years ago

Hi,

Sorry for the late reply. I must never have wrote a function for it sorry.

However I have just added two more functions:

  1. helper.__text_company -- Reads test data without giving you the sentiment values associated to the test data but returns the text, company and ids
  2. helper.create_semeval_file -- Given a list of ids and associated predicted sentiment values as well a String which is a file path. Will write the predicted scores with the ids in the same format as those in the submission files to the file at the given file path.

Below is an example script to run given you have the training and testing data: import helper from semeval.lstms.EarlyStoppingLSTM import EarlyStoppingLSTM fin_word2vec_model = helper.fin_word_vector() early_lstm = EarlyStoppingLSTM(fin_word2vec_model) train_texts, train_sentiments, train_companies = helper.fin_data('train') test_texts, test_companies, test_ids = helper.fin_data('test', test_data=True) early_lstm.fit(train_texts, train_sentiments) test_results = early_lstm.predict(test_texts) test_results = test_results.reshape(test_results.shape[0], ) helper.create_semeval_file(test_ids, test_results, 'submission_file.json')

Hope this helps.