aneesh-joshi / LSTM_POS_Tagger

A simple POS Tagger made using a Bidirectional LSTM using keras trained on the Brown Corpus
34 stars 19 forks source link

reshaping problem #2

Closed minasmz closed 6 years ago

minasmz commented 6 years ago

in line 31 of "make_model.py" I encounter an error which tells me "ValueError: cannot reshape array of size 44031000 into shape (22,100,195)" and it is logical. could you please tell me why such an error is there? and how can I solve it? Another question I have is It seems that "extract_data.py" converts text to pickle format directly! so what is the usage of "make_glove_pickle.py"? Thanks in advanced!

aneesh-joshi commented 6 years ago

@minasmz Thank you for raising this issue. I'm sorry it took so long to reply.

You can safely comment out line 31. It seems I had put it for some testing purposes which I have since forgotten.

About your second question: In extract_data.py the brown corpus is tokenised and saved as pickle in data.pkl In 'make_glove_pickle.py' takes the glove.6B.100d.txt file and converts it into a dict which has words as the key and the vector as its value.

Essentially:

brown_corpus -> extract_data.py  -> tokenised sentences saved as pickle
Glove.txt -> make_glove_pickle.py -> dict from a word to its vector

I realised that my code isn't too clear. So I am working on improving it. I will push the newer one in a day or two.

Hopefully, once I write a blog post on this, things will get a lot clearer.

Feel free to ask more questions.

aneesh-joshi commented 6 years ago

@minasmz Please check now. I have added a lot more clarity.

minasmz commented 6 years ago

Thank you @aneesh-joshi for your really helpful answer. I commented that line and it runs properly!

aneesh-joshi commented 6 years ago

@minasmz Also consider cloning the modified repository, building the environment and reading the new code. It is a lot clearer.

If everything is clear, please close the issue.

minasmz commented 6 years ago

Thank you!