Closed atherdon closed 6 years ago
https://www.kaggle.com/c/instacart-market-basket-analysis
I think we have to manually do the labelling of the items for eg, milk and butter as dairy as we do not have such datasets available, if they are then we can think of image recog, etc
and this is template that i converted: http://www.grocerylists.org/wp-content/uploads/2013/01/grocerylistsDOTorg_Deluxe_v3_3.pdf
i store links at separated tasks: https://github.com/atherdon/groceristar/issues/417
i also trying to categorize links at excel file. split it like - diets, images, allergies. but maybe for now it's not important
groceristar is about shopping. so i think if we're talking about food at packaging -it'll be awesome. cause it can increase a groceristar value for people.
but other my projects is about food, so we can explore a different ways
if for now we can train the model and when i pass an array with 1500 ingredients and it'll assign a department without my notice - i'll happy and this is what i wanted.
right now i want to add as much templates as i can to groceristar - without adding a lot of new functionality. cause i have not rewrite it before. i mean project is require some clean up
But this doesn't mean that we cannot do something else
So there a different pathes that we have. what to choose - don't know
so basically, my idea about adding ML to groceristar was simple right now we have only one grocery list template and i grab and import in by hands
few days ago i find at least 100-200 samples of templates - that i want to add
and i think that copy-pasting it by hands - really stupid idea I am sorry but how does ML helps in adding template.. first stage is about to grab data and i'm not a pro at ML so if this possible to use something more intelligent that just a web scrapper - it'll be timesaver
when we grab "ingredients/shopping list items" we need to add a category(Department) to each item so if we have milk or cheese - we add dairy and here ML can be beneficial some of that templates(back to 200 links) are images - so we can use OCR
other thing is about measurements like if i add olive oil - i don't need options like kilograms, right. just lliter* or ml or if i add butter - i didn't use teaspoons - only grams or kilograms
Ok I got it we need to tokenize things.. and segregate things.. maybe - i'm not a ML pro, so you'll need to teach me i mean my idea can be changed in order to get a result Frankly I am not a pro too .. I am learning through projects. .
understand - so this can be a win-win situation so you think we cannot grab data from that 200 links imean without copy-paste
Ok let me explore.. can you send me one link and tell me the goal want to achieve.. I will let you know by tomorrow the what could be our next step.. sure. but content is different - so it cannot be a simple regEx solution btw, i'm ok to grab that data by hands, it's not a first time for me it's just take a lot of time
Yes I got it .. we never know when will be the end for it .. it seems interesting to me .. I will definitely explore End meaning we might get more template in future
bottom i store links at separated tasks: https://github.com/atherdon/groceristar/issues/417
i also trying to categorize links at excel file. split it like - diets, images, allergies. but maybe for now it's not important and this is template that i convert: http://www.grocerylists.org/wp-content/uploads/2013/01/grocerylistsDOTorg_Deluxe_v3_3.pdf I have checked few links .. we can get list of items from text but for images .. I need to explore things .. i think we have a lot of libraries that goes OCR for converting images to text