calgo-lab / green-db

The monorepo that powers the GreenDB.
https://calgo-lab.github.io/green-db/
22 stars 2 forks source link

Product classification microservice #140

Closed BigDatalex closed 1 year ago

BigDatalex commented 1 year ago

This PR adds additional components and functionality to automatically predict product categories based on the product's name and description.

An additional worker is added that calls an API, which is served by flask + waitress, for each product that is extracted. The product classifications are stored in additional tables in the GreenDB. In addition, thresholds have been identified from a hand-labeled dataset to exclude out-of-distribution products and to achieve a precision of at least 90%.

ToDo:

BigDatalex commented 1 year ago

Currently, this is deployed only locally, for the final version I need to update the image name here: https://github.com/calgo-lab/green-db/blob/28096d2aaf014b76f381bd621fbd0a10915a2456/infrastructure/charts/product-classification/helm/values.yaml#L9

And do we need a new image pushed to GitHub? Like here: https://github.com/calgo-lab/green-db/blob/7c85928ad7a98ec64756e3c78953ede07c1fcfe0/.github/workflows/build-and-push-images.yaml#L79