Faheetah / relaxir

Online recipe book written in Elixir and Phoenix
0 stars 0 forks source link

Link ingredients to USDA #37

Open Faheetah opened 2 years ago

Faheetah commented 2 years ago

Need to probably normalize the USDA data, it's pretty all over the place, with lots of duplicates that have varying nutrition. Probably need a script that will consolidate entries. Might need an intermediary table and/or a good way to tie Ingredients to specific USDA. Could also let USDA just make suggestions for Ingredients and let Ingredients track its own nutrition separately. Like a "new ingredient" and it lets you clone a USDA entry, and make any changes to the nutrition, or create a brand new ingredient and input your own nutrition.

Faheetah commented 2 years ago

I think the best thing would be to be more opinionated about things. There should probably be a strict set of allowed ingredients then maybe something like an "add an ingredient" functionality. I'm thinking what probably would be the best bet, is to see if USDA data has ingredients with it (believe so), and then we would have two sets, USDA that is primitive ingredients (like "beef" and "celery" and "xanthan gum"), then have complex where if they have multiple ingredients (or we have a set filter, like if nutrition for any ingredient is significant) then the ingredient is set as a parent ingredient.

Basically, ingredients would be complex heirarchical objects like recipes. So there would be something like

From here we can just scan USDA imports, extract simple ingredients, then complex ingredients can be cascaded through

This has a side effect of making ingredients into basically recipes, instead of the other way around. Interesting.

Faheetah commented 2 years ago

Looks like what happens is there is food.csv that has a description and link to fdc_id, but there is things like 787782 that has "Cabbage, red, raw" then branded_foods that has "RED CABBAGE", i.e.

"529355","Safeway, Inc.","021130702183","RED CABBAGE.","85","g","2 cup","Pre-Packaged Fruit & Vegetables","LI","2018-01-19","2018-01-19","United States","" "549246","READY PAC FOODS","077745201979","RED CABBAGE.","85","g","2 cup","Pre-Packaged Fruit & Vegetables","LI","2017-11-01","2017-11-01","United States",""

That's why there are so many duplicates in the results. What needs to happen is there needs to be more granular food tracking, so instead of two food types we'll have:

Recipe Usda.Food Usda.BrandedFood

Then users can see the full name of the produce. So a search for "red cabbage" will return:

We should also prioritize non branded foods.

Need further investigation of how survey_fndds_food vs sr legacy vs foundation foods etc breaks down though.

Faheetah commented 2 years ago

That also means search won't be able to dedup this, so we might need another one for branded food. Might need a change to invert, or we just index the name in there with the brand in the string.

Faheetah commented 2 years ago

When redesigning this, we should pivot so that say, search only digs up Ingredients, when a "new ingredient" is added it can be linked to USDA data. Probably best approach here is to just direct link Ingredient -> Food. Only question there is, how to define custom nutrients. Alternative is to have Ingredient that has its own IngredientNutrients that link to the USDA nutrient table, and we just clone the USDA nutrients over, probably the best bet here since we can massage our own ingredients.

Faheetah commented 2 years ago

For relations, might be easiest to just do a separate join table per recipe relation, RecipeIngredients and RecipeRecipes, then each of those has a rank, I'm not worried about collisions on ranks honestly, that can be validated in code and it's going to be unlikely those entries will become corrupted (having two entries with the same rank) outside of huge bugs. Then we just make a list of ingredients and sort by rank, as a mixed list of Recipe and Ingredient, that would help us also with rendering. Probably generate RecipeNutrients on create and update as well, then if that is polymorphic (recipe.nutrients and ingredient.nutrients) we can just roll down the list and sum nutrients to update the recipe, that would trade off using more space in the DB but with the benefit of faster reads on recipes.

Faheetah commented 2 years ago

Also RecipeNutrient and IngredientNutrient probably could point to a Nutrient table that just has a wide column of all the nutrients, fields like calcium id, fat id, saturated_fat id, etc, no reason to link to nutrients since we will rarely if ever need to access a nutrient as its own entity rather than just let it be an int value. Need to ensure measurements are all normalized which can be tricky because I think the USDA data mixes units with things.

Faheetah commented 11 months ago

Wont do, will build out nutrition by hand, USDA data is a disaster