This PR introduces a new classifier to the API that is expected to perform better on the input data in the long run.
It also involves a structural revamp of the backend codebase and API. The frontend has been adapted to handle the changes in the API
Changes in backend
New dependencies:
tensorflow>=2
numpy
Added folders bin and src/scripts. Removed src/model and moved necessary files.
the bin folder contains the pre-trained RNN of ~7MB and will be the new home of bayes_model.pkl
Added backend readme which is incomplete, but outlines what should be there and describes the different classification approaches
Changes in code
Added script for training rnn. It is not necessary to run it as a pretrained model already lies in the repository.
Training it might take ~20 minutes, but an improved model might take many hours to train, which is way too long to do
each time you build the backend.
Refactored more code in the bayes classifier to do the same, but with less code.
Everything regarding the bayes classifier itself is now in a single script.
You still have to train the bayes model when you build the script for two reasons:
1: it goes relatively fast, 2: the binaries are too large to host on github (100MB)
classifier.py still relies on the preprocess function. This can be changed in the future
Changes in APInterface
API now returns a smaller json with necessary information, as string formatting
and translation should be done in the front-end. It optionally accepts the classifier that should be used
[api-url]/mann-eller-kvinne now accepts json on the form:
Adapted frontend to communicate properly with api and format/translate the data received.
Added marksverdhei to the footer
The frontend should work the same way as before the PR, as the PR mainly regards updates to the API itself.
It does not yet make use of the RNN classifier, though it was successful when I tested it.
For the frontend work to, it needs to be connected to an updated API
What's next
This pr does not include changes to the frontend that lets users interact with the new classifier.
An idea for this could be to make a switch that lets the user swap between calling the rnn classifier and the bayes classifier.
We should evaluate the classifiers to see how well they actually perform on the validation set and test set. During training, the rnn got a validation accuracy of 70%, but it is not the most appropriate metric to use, as the classes are somewhat imbalanced
Further training and improvement of the RNN. I didnt focus on fully optimizing the model performance, hyperparameter tuning and such, but to make a working proof of concept. The final RNN might take many hours to train, and might change in its architecture.
This PR introduces a new classifier to the API that is expected to perform better on the input data in the long run. It also involves a structural revamp of the backend codebase and API. The frontend has been adapted to handle the changes in the API
Changes in backend
New dependencies:
tensorflow>=2
numpy
Added folders
bin
andsrc/scripts
. Removedsrc/model
and moved necessary files.the bin folder contains the pre-trained RNN of ~7MB and will be the new home of bayes_model.pkl
Added backend readme which is incomplete, but outlines what should be there and describes the different classification approaches
Changes in code
Added script for training rnn. It is not necessary to run it as a pretrained model already lies in the repository. Training it might take ~20 minutes, but an improved model might take many hours to train, which is way too long to do each time you build the backend.
Refactored more code in the bayes classifier to do the same, but with less code. Everything regarding the bayes classifier itself is now in a single script. You still have to train the bayes model when you build the script for two reasons: 1: it goes relatively fast, 2: the binaries are too large to host on github (100MB)
Changes in APInterface
API now returns a smaller json with necessary information, as string formatting and translation should be done in the front-end. It optionally accepts the classifier that should be used
[api-url]/mann-eller-kvinne
now accepts json on the form:and returns json on the form
"likelihood" was renamed to "probability" as they are conceputally different.
Changes in frontend
Adapted frontend to communicate properly with api and format/translate the data received.
Added marksverdhei to the footer
The frontend should work the same way as before the PR, as the PR mainly regards updates to the API itself. It does not yet make use of the RNN classifier, though it was successful when I tested it.
For the frontend work to, it needs to be connected to an updated API
What's next
This pr does not include changes to the frontend that lets users interact with the new classifier. An idea for this could be to make a switch that lets the user swap between calling the rnn classifier and the bayes classifier.
We should evaluate the classifiers to see how well they actually perform on the validation set and test set. During training, the rnn got a validation accuracy of 70%, but it is not the most appropriate metric to use, as the classes are somewhat imbalanced
Further training and improvement of the RNN. I didnt focus on fully optimizing the model performance, hyperparameter tuning and such, but to make a working proof of concept. The final RNN might take many hours to train, and might change in its architecture.