Closed susiejojo closed 3 years ago
Also @susiejojo it would be great if the data preprocessing as well as the prediction is set up as a pipeline because then the api server can directly take the input as text and using pipeline function preprocess and predict the text at the same time in short it would be easier for the deployment.
Solved via #9 . Merged into main.
The model broadly performs the following functions:
[x] fetch the Kaggle MBTI dataset
[x]
preprocessing
: clean text, use nltk to process to stem, tokenise, remove stopwords, NER from text[ ] visualise the given data classes and distribution
[x] send data to the training model (preferably BERT)
[x] choose hyperparameters for training
[x] cross-validate
[x] contrast BERT vs LSTM models
[x] an
evaluator
function that reports accuracy and other metrics ( I believe accuracy will be good enough for us, we need to check the confusion matrix once tho)[x] create a
predict
function which accepts text, preprocesses it and returns the predicted class