Normalize training data

elimu-ai / ml-storybook-reading-level

🤖📚 Machine learning model which predicts the reading level of a storybook.

MIT License

3 stars 4 forks source link

Normalize training data #28

Open nya-elimu opened 2 months ago

nya-elimu commented 2 months ago

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.normalize.html

Why? Why normalize the training data? To check if that would improve the accuracy score.

howdyDp commented 1 month ago

I was trying to normalize the data, i wanted to know what kind of normalization are you looking for ? min max normalization or z-scaled normalization and this is my first open source contribution so if you have any advice do let me know

jo-elimu commented 1 month ago

@howdyDp We are looking for whichever normalization technique would result in an improved accuracy score.

[!TIP] Try running python run_all_steps.py for each kind of normalization you want to try, and see if that results in improvements in step3_2_accuracy_score.txt and step3_2_mean_absolute_error.txt

And if it turns out that normalization does not improve the accuracy, then we wouldn't use normalization at all.

eve-b612 commented 1 month ago

Hi! I see the issue is still open, so I thought I would give this a try. This is my first contribution, I've normalised the data 3 ways and wanted to share the results. My work is in a jupyter notebook, I've forked the repo and created a new branch named 'normalisation-experiment'. I wanted to know which folder to upload this in the repo branch?

jo-elimu commented 1 month ago

@eve-b612 Feel free to open a pull request 🙂

You can add your code changes related to data processing to the step1_prepare folder.

eve-b612 commented 1 month ago

Sent the pull request for review! :)