Open sanjayb678 opened 6 years ago
Hi @sanjayb678 , I wrote and ran the whole script in spyder(python 3.6). i would advise you first keep the same configuration and I did not test is the code works exactly the same in a notebook. saving shouldn't be a problem as far as I know. however you can skip over this line as long as the model is in memory.
1) Where is the pre-trained model for Word2Vec?
2) Error in Word2Vec.py
File "word2vec.py", line 178, in <module>
corpus = createCorpus(data)
NameError: name 'data' is not defined
why have you done label_encoder,onehot_encoded,onehot=summonehot(data["summaries"])
shouldn't the function argument by corpus instead of data["summaries"]??
@PratikNalage
cnn_daily_load.py you too create function like this,
def cnn_daily_load():
filenames=load_data(datasets["cnn"],data_categories[0])
"""----------load the data, sentences and summaries-----------"""
for k in range(len(filenames[:400])):
if k%2==0:
try:
data["articles"].append(cleantext(parsetext(datasets["cnn"],data_categories[0],"%s"%filenames[k])))
except Exception as e:
data["articles"].append("Could not read")
print(e)
else:
try:
data["summaries"].append(cleantext(parsetext(datasets["cnn"],data_categories[0],"%s"%filenames[k])))
except Exception as e:
data["summaries"].append("Could not read")
print(e)
return data
then simply import cnn_daily_load.py to word2vec.py
from cnn_daily_load import cnn_daily_load, cleantext
data = cnn_daily_load()
your first question is Where is the pre-trained model for Word2Vec?
i think,
Actually simply we are using skipgram model algorithm to generate own word embeddings. that is why we no need for word2vec pre-trained model. it is another way of generating word embedding.
Pre trained because I do not train it together with the neural network. I do pretrain skipgram
On Fri, 12 Jul 2019, 09:07 Murugan R, notifications@github.com wrote:
@PratikNalage https://github.com/PratikNalage
cnn_daily_load.py you too create function like this,
def cnn_daily_load(): filenames=load_data(datasets["cnn"],data_categories[0])
"""----------load the data, sentences and summaries-----------""" for k in range(len(filenames[:400])): if k%2==0: try: data["articles"].append(cleantext(parsetext(datasets["cnn"],data_categories[0],"%s"%filenames[k]))) except Exception as e: data["articles"].append("Could not read") print(e) else: try: data["summaries"].append(cleantext(parsetext(datasets["cnn"],data_categories[0],"%s"%filenames[k]))) except Exception as e: data["summaries"].append("Could not read") print(e) return data
then simply import cnn_daily_load.py to word2vec.py
from cnn_daily_load import cnn_daily_load, cleantext data = cnn_daily_load()
your first question is Where is the pre-trained model for Word2Vec?
Actually simply we are using skipgram model algorithm to generate own word embeddings. that is why we no need for word2vec pre-trained model. it is another way of generating word embedding.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/DeepsMoseli/Bidirectiona-LSTM-for-text-summarization-/issues/2?email_source=notifications&email_token=AG5XOUV3RX5UNKH7L4O4JC3P7AUTRA5CNFSM4FPTU3HKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZY4VRI#issuecomment-510773957, or mute the thread https://github.com/notifications/unsubscribe-auth/AG5XOUTORFPCKZ2QXF3NYH3P7AUTRANCNFSM4FPTU3HA .
-- This message and attachments are subject to a disclaimer. Please refer to
http://www.it.up.ac.za/documentation/governance/disclaimer/ http://www.it.up.ac.za/documentation/governance/disclaimer/ for full details.
not really @DeepsMoseli. in this place you are using gensim - skipgram algorithm(word2vec) to build normal word2vec model then generating embedding for the words. training from scratch.
great stuff. we never used word2vec pretrained model here.
@amanjaswani i was not understood your question, but give you a hand.
label_encoder,onehot_encoded,onehot=summonehot(data["summaries"])
label_encoder for training label. word2vec embedding for training data.
In the function word2vecmodel, the model that is saved as word2vec throws error that it’s not UTF-8 encoded. Saving disabled. FYI I’m running the code using Jupyter notebook. Thanks in advance. Let me know if I’m doing something wrong.