Learn-Write-Repeat / Contribution-program

Repository for competition in open-source contribution under DevIncept.
MIT License
8 stars 44 forks source link

next word predictor #23

Open arpit-dwivedi opened 4 years ago

arpit-dwivedi commented 4 years ago

Add in the comments the links of resources you found and also add these things:

  1. Algorithms used for each given link
  2. Libraries used
  3. Approx lines of codes.

At the end also conclude which one is better.

madalatrivedh20 commented 4 years ago

link 1: https://towardsdatascience.com/exploring-the-next-word-predictor-5e22aeb85d8f i) Algorithms used: RNN-LSTM ii) Libraries used: nltk, numpy, keras-from keras Tokenizer,LSTM,Dense,Embedding,Sequential,pad_sequences,to_categorical iii) Approx lines f code: 53 link 2: https://towardsdatascience.com/exploring-the-next-word-predictor-5e22aeb85d8f i) Algorithms used: N-grams ii) Libraries used: nltk, collections iii) Approx lines f code: 135

link 2: https://www.youtube.com/watch?v=35tu6XnRkH0 i) Algorithms used: RNN-LSTM ii) Libraries used:numpy, keras-from keras Tokenizer,LSTM,Dense,Embedding,Sequential,to_categorical iii) Approx lines of code:50

Between LSTM and N-grams approach RNN-LSTM is the best because it is a more advanced approach, using a neural language. Standard RNNs and other language models become less accurate when the gap between the context and the word to be predicted increases but LSTM can be used to tackle the long-term dependency problem because it has memory cells to remember the previous context.

jaydhamale7 commented 4 years ago

link 1: https://towardsdatascience.com/exploring-the-next-word-predictor-5e22aeb85d8f A) i) Approach : RNN-LSTM ii) Libraries used: nltk iii) Approx lines of code: 80

B) i) Algorithms used: RNN-LSTM ii) Libraries used : numpy, from keras Tokenizer,LSTM,Dense,Embedding, Sequential,load_model,to_categorical, pad_sequences iii) Approx lines of code: 65

john-2424 commented 4 years ago

Next word predictor is or can be an application of Natural Language (NLP), where we can use different algorithms or techniques of NLP and Recurrent Neural Network (RNN) to predict the next word in the sentence. There are many algorithms and some of them are n-gram, Kneser-Ney smoothing, k Nearest Neighbours, RNN-LSTM, RNN-GRU.

1) https://towardsdatascience.com/exploring-the-next-word-predictor-5e22aeb85d8f i) n-gram -> An n-gram model is a type of probabilistic language model for predicting the next item in a sequence and it is a statistical natural language processing model. ii) re, nltk.tokenize, word_tokenize, collections iii) 132

2) https://rpubs.com/teez/196761#:~:text=To%20predict%20the%20next%20word,has%20the%20highest%20weighted%20frequency. i) Kneser-Ney smoothing -> It is a probabilistic language model ii) re, nltk, nltk.corpus, nltk.data, nltk.stem.wordnet, collections, numpy, math iii) 564

3) https://pudding.cool/2019/04/text-prediction/ i) k Nearest Neighbours -> Training based model ii) sklearn.neighbors, sklearn.model_selection, sklearn.datasets, numpy, matplotlib.pyplot iii) 100

4) https://towardsdatascience.com/exploring-the-next-word-predictor-5e22aeb85d8f i) RNN-LSTM and RNN-GRU -> Training and memory based RNN model ii) Keras, Tensorflow - Dense, Activation, Dropout, Input, LSTM, Reshape, Lambda, RepeatVector, Adam, to_categorical iii) 100

After studying these models, the RNN-LSTM and RNN-GRU were the best models to implement, due to less code and more accuracy. Between RNN-LSTM and RNN-GRU, RNN-LSTM is the best among the two. This is due to the following: RNN-GRU use less training parameters and therefore use less memory, execute faster and train faster than RNN-LSTM whereas RNN-LSTM is more accurate on dataset using longer sequence. In short, if sequence is large or accuracy is very critical, we can go for RNN-LSTM whereas for less memory consumption and faster operation we can go for RNN-GRU.