MaxHalford / maxhalford.github.io

:house_with_garden: Personal website
https://maxhalford.github.io
MIT License
13 stars 5 forks source link

blog/sklearn-text-classifier-memory-footprint-reduction/ #13

Open utterances-bot opened 3 years ago

utterances-bot commented 3 years ago

Reducing the memory footprint of a scikit-learn text classifier - Max Halford

Context This week at Alan I’ve been working on parsing French medical prescriptions. There are three types of prescriptions: lenses, glasses, and pharmaceutical prescriptions. Different information needs to be extracted depending on the prescription type. Therefore, the first step is to classify the prescription. The prescriptions we receive are pictures taken by users with their phone. We run each image through an OCR to obtain a text transcription of the image.

https://maxhalford.github.io/blog/sklearn-text-classifier-memory-footprint-reduction/

Laetitia-LefebvreNare commented 3 years ago

Awesome Max ! About the importance of '50' and '25' I may have an idea : as the sight correction needed by the patient is written in 0.25 D, you find a lot of '50' '25' in these prescriptions 🙂