Set up baseline model - Githubissues

susiejojo commented 3 years ago

The model broadly performs the following functions:

[x] fetch the Kaggle MBTI dataset
[x] preprocessing: clean text, use nltk to process to stem, tokenise, remove stopwords, NER from text
[ ] visualise the given data classes and distribution
[x] send data to the training model (preferably BERT)
[x] choose hyperparameters for training
[x] cross-validate
[x] contrast BERT vs LSTM models
[x] an evaluator function that reports accuracy and other metrics ( I believe accuracy will be good enough for us, we need to check the confusion matrix once tho)
[x] create a predict function which accepts text, preprocesses it and returns the predicted class

V2dha commented 3 years ago

Also @susiejojo it would be great if the data preprocessing as well as the prediction is set up as a pipeline because then the api server can directly take the input as text and using pipeline function preprocess and predict the text at the same time in short it would be easier for the deployment.

susiejojo commented 3 years ago

Solved via #9 . Merged into main.

MLH-Fellowship / Social-BERTerfly

Set up baseline model #2