GroceriStar / gs-ml-nlp

1 stars 2 forks source link

info. researching more in order to get more informatio #11

Closed atherdon closed 6 years ago

atherdon commented 6 years ago

so basically, my idea about adding ML to groceristar was simple right now we have only one grocery list template and i grab and import in by hands

few days ago i find at least 100-200 samples of templates - that i want to add

and i think that copy-pasting it by hands - really stupid idea I am sorry but how does ML helps in adding template.. first stage is about to grab data and i'm not a pro at ML so if this possible to use something more intelligent that just a web scrapper - it'll be timesaver

when we grab "ingredients/shopping list items" we need to add a category(Department) to each item so if we have milk or cheese - we add dairy and here ML can be beneficial some of that templates(back to 200 links) are images - so we can use OCR

other thing is about measurements like if i add olive oil - i don't need options like kilograms, right. just lliter* or ml or if i add butter - i didn't use teaspoons - only grams or kilograms

Ok I got it we need to tokenize things.. and segregate things.. maybe - i'm not a ML pro, so you'll need to teach me i mean my idea can be changed in order to get a result Frankly I am not a pro too .. I am learning through projects. .

understand - so this can be a win-win situation so you think we cannot grab data from that 200 links imean without copy-paste

Ok let me explore.. can you send me one link and tell me the goal want to achieve.. I will let you know by tomorrow the what could be our next step.. sure. but content is different - so it cannot be a simple regEx solution btw, i'm ok to grab that data by hands, it's not a first time for me it's just take a lot of time

Yes I got it .. we never know when will be the end for it .. it seems interesting to me .. I will definitely explore End meaning we might get more template in future

bottom i store links at separated tasks: https://github.com/atherdon/groceristar/issues/417

i also trying to categorize links at excel file. split it like - diets, images, allergies. but maybe for now it's not important and this is template that i convert: http://www.grocerylists.org/wp-content/uploads/2013/01/grocerylistsDOTorg_Deluxe_v3_3.pdf I have checked few links .. we can get list of items from text but for images .. I need to explore things .. i think we have a lot of libraries that goes OCR for converting images to text

atherdon commented 6 years ago

https://www.kaggle.com/c/instacart-market-basket-analysis

I think we have to manually do the labelling of the items for eg, milk and butter as dairy as we do not have such datasets available, if they are then we can think of image recog, etc

atherdon commented 6 years ago

and this is template that i converted: http://www.grocerylists.org/wp-content/uploads/2013/01/grocerylistsDOTorg_Deluxe_v3_3.pdf

i store links at separated tasks: https://github.com/atherdon/groceristar/issues/417

i also trying to categorize links at excel file. split it like - diets, images, allergies. but maybe for now it's not important

atherdon commented 6 years ago

https://world.openfoodfacts.org/product/8722700119883/vinaigrette-allegee-en-matieres-grasses-amora

atherdon commented 6 years ago

groceristar is about shopping. so i think if we're talking about food at packaging -it'll be awesome. cause it can increase a groceristar value for people.

but other my projects is about food, so we can explore a different ways

if for now we can train the model and when i pass an array with 1500 ingredients and it'll assign a department without my notice - i'll happy and this is what i wanted.

right now i want to add as much templates as i can to groceristar - without adding a lot of new functionality. cause i have not rewrite it before. i mean project is require some clean up

But this doesn't mean that we cannot do something else

So there a different pathes that we have. what to choose - don't know

atherdon commented 6 years ago

https://world.openfoodfacts.org/data