Oppskrift / oppskrift_api

0 stars 0 forks source link

Database schema #5

Open Djyp opened 4 years ago

Djyp commented 4 years ago

Let's talk about the data !

A year ago, I started thinking about the best way to store a recipe, and how to build the database. I came accross https://schema.org/Recipe

And here's my thought about it

Schema.org proposed property Format What to do about it
Properties from Recipe Specific properties related to a Recipe
cookTime Duration in ISO format ✔️ Use it as is
cookingMethod The method of cooking, such as Frying, Steaming, ... ❌ Don't use it
nutrition Should contain Nutrition information as an object where values represent very specific information ❌ Don't use it, it's too detailed
recipeCategory entrée, dessert ✔️
recipeCuisine type of cuisine (French, Mexican, …) ✔️
recipeIngredient schema.org says it's a single ingredient but it's a list of ingredients ✔️
recipeInstructions Obvious ✔️
recipeYield quantiy produced (number of servings, number of people) ✔️
suitableForDiet Obvious ✔️
Properties from HowTo Properties of any tutorial, can be use on a Recipe
estimatedCost Obvious ✔️
performTime Duration to do the instructions 🤷‍♂
prepTime Duration to prepare the items do the instructions ok so those two are clearly related, the logic would tell us to use the performTime and not the prepTime. But I've seen that prepTime is used on some sites. Whatever we do we'll keep only one !
step The steps to complete the HowTo ❌ useless while we have recipeInstructions, I don't get why it exists
supply A supply consumed when performing instructions or a direction ❌ useless, ingredients should be anough
tool An object used (but not consumed) when performing instructions or a direction ✔️I think it shoud be optional
totalTime Total duration ✔️ Should be used but calculated (=prepTime+cookTime+waitTime)
yield same as recipeYield ❌ again ? wait, no !
Properties from CreativeWork Properties of any creative work, can be use on a Recipe (inheritance)
about The subject matter of the content.
abstract summarized description 🤷‍♂ maybe
accessMode the human sensory used
accessModeSufficient
accessibilityAPI Indicates that the resource is compatible with the referenced accessibility API ❓ Everything related to acessibility should be discussed
accessibilityControl Identifies input methods that are sufficient to fully control the described resource
accessibilityFeature Content features of the resource, such as accessible media, alternatives and supported enhancements for accessibility
accessibilityHazard A characteristic of the described resource that is physiologically dangerous to some users ❌ Not concerned
accessibilitySummary Summary about accessibility
accountablePerson The person legally responsible
acquireLicensePage How to get/purchase a licence
aggregateRating The global rating 🤷‍♂ Not sure if we should rate recipes, people like a recipe or not, I think if anything useful we could guess popular recipes with how much they are liked (activitypub-wise), shared and maybe forked
alternativeHeadline a second title
assesses an educational property telling what the creative work assesses/evaluates
associatedMedia A media object that encodes this CreativeWork
audience An intended audience, i.e. a group for whom something was created ❌ not concerned
audio Obvious ❌ Don't see why it would be used
author of course ! ✔️
award
character ❌ too specific, not related to almost all recipes
citation ❌ useless
comment 🤷‍♂ Why not, but regarding ActivityPub comments are activity
commentCount Only if we use comments
conditionsOfAccess ❌ not concerned
contentLocation Physical location ❌ not concerned
contentRating Official rating like PG-13 ❌ not concerned
contentReferenceTime related to an event ❌ not concerned
contributor another author or authors ❌ could be useful but it's overkill
copyrightHolder ❌ recipes will be published in public domain, not concerned
copyrightYear
correction ❌ could be useful to describe a fork maybe but overrated
creativeWorkStatus Draft, Published, Obsolete
creator ❌ we chose the author property
dateCreated ✔️
dateModified ✔️
datePublished ✔️
discussionUrl ✔️ Could be used but not persisted
editEIDR ❌ not concerned
editor
educationalAlignment
educationalLevel
educationalUse
encoding
encodingFormat ❌ it's not a file
exampleOf ❓ could be used to tell what is work is forked from
expires
funder
genre
hasPart
headline
inLanguage language in IETF BCP 47 Standard ✔️ of course !
interactionStatistic
interactivityType
isAccessibleForFree boolean ❓ let's use it in json-ld and always have on true
isBasedOn ❓ could be used to tell what is work is forked from
isFamilyFriendly boolean :question: it's a recipe ! but what if… :thinking:
isPartOf ✔️ To indicate the recipe is part of a cookbook
keywords ❌ no one wants to add tags to a recipe do they ?
learningResourceType
license
locationCreated
mainEntity
maintainer
material
materialExtent
mentions
offers
position
producer ❌ it's not a movie
provider
publication
publisher
publisherImprint
publishingPrinciples
recordedAt
releasedEvent
review ❌ same as comments
schemaVersion
sdDatePublished
sdLicense
sdPublisher
sourceOrganization
spatial
spatialCoverage
sponsor
teaches
temporal
temporalCoverage
text
thumbnailUrl ❓ could be used to have a thumbnail url of the image url
timeRequired
translationOfWork
translator
typicalAgeRange
usageInfo
version
video ❓ not sure
workExample
workTranslation
Properties from Thing Very generic properties that can apply to a recipe
additionalType
alternateName
description ✔️ Could bring context information "my grandma's most popular recipe", "typical dish in Kenya"
disambiguatingDescription ❌ TMI
identifier
image ✔️ of course !
mainEntityOfPage ❌ We'll use the url
name ✔️ obviously yes !
potentialAction
sameAs
subjectOf
url ✔️ just for JSON-LD
Example of a recipe in JSON-LD on marmiton.org

{
    "@context": "http://schema.org",
    "@type": "Recipe",
    "name": "Tourte aux cerises",
    "recipeCategory": null,
    "image": "https://assets.afcdn.com/recipe/20171113/74823_w1024h768c1cx2597cy1731cxt0cyt0cxb5195cyb3463.jpg",
    "datePublished": "2004-04-28T08:56:00+02:00",
    "prepTime": "PT40M",
    "cookTime": "PT30M",
    "totalTime": "PT70M",
    "recipeYield": "6 personnes",
    "recipeIngredient": [
        "300 g de farine",
        "200 g de beurre",
        "50 g de sucre semoule",
        "1 oeuf",
        "1 pinc\u00e9e de sel",
        "500 g de cerise",
        "150 g de sucre semoule",
        "2 cuill\u00e8res \u00e0 soupe de kirsch",
        "1 jaune d'oeuf pour dorer"
    ],
    "recipeInstructions": [
        {
            "@type": "HowToStep",
            "text": "Pr\u00e9parer la p\u00e2te la veille :"
        },
        {
            "@type": "HowToStep",
            "text": "Tamiser la farine puis la verser dans un saladier et creuser une fontaine. D\u00e9poser au centre les morceaux de beurre ramolli, l'oeuf entier, le sucre et le sel. Bien m\u00e9langer et p\u00e9trir du bout des doigts, puis rouler la p\u00e2te en boule et la laisser reposer 1 journ\u00e9e."
        },
        {
            "@type": "HowToStep",
            "text": "Le lendemain, \u00e9taler la p\u00e2te \u00e0 l'aide d'un rouleau sur un plan de travail propre et farin\u00e9."
        },
        {
            "@type": "HowToStep",
            "text": "Beurrer un moule \u00e0 tarte d'environ 23 cm de diam\u00e8tre puis y d\u00e9poser la moiti\u00e9 de la p\u00e2te et r\u00e9server l'autre moiti\u00e9."
        },
        {
            "@type": "HowToStep",
            "text": "Pr\u00e9chauffer votre four \u00e0 210\u00b0C (thermostat 7)."
        },
        {
            "@type": "HowToStep",
            "text": "Laver, \u00e9queuter et d\u00e9noyauter les cerises (on peut utiliser des cerises en bocaux). Les s\u00e9cher puis les disposer sur le fond de tarte. Saupoudrer les cerises de sucre, puis arroser de kirsch. Couvrir avec l'autre moiti\u00e9 de la p\u00e2te, bien souder les bords."
        },
        {
            "@type": "HowToStep",
            "text": "Badigeonner la tourte de jaune d'oeuf \u00e0 l'aide d'un pinceau pour faire dorer puis enfourner 30 min \u00e0 four chaud."
        },
        {
            "@type": "HowToStep",
            "text": "A d\u00e9guster plut\u00f4t ti\u00e8de."
        }
    ],
    "author": "Annick",
    "description": "farine, beurre, sucre semoule, oeuf, sel, cerise, sucre semoule, kirsch, jaune d'oeuf",
    "keywords": "Tourte aux cerises, , farine, beurre, sucre semoule, oeuf, sel,tr\u00e8s facile,bon march\u00e9",
    "recipeCuisine": "",
    "aggregateRating": {
        "@type": "AggregateRating",
        "reviewCount": 1,
        "ratingValue": 5,
        "worstRating": 0,
        "bestRating": 5
    }
}
Scttpr commented 4 years ago

It looks pretty cool :smile: !

Some questions :

Some data types from marmiton.org seem strange :

I have added a data dictionnary in documentation repository, it could be easier to handle a single file rather than editing here in issue, in different posts. What do you think about discussing here and editing there ?

I deleted obviously unwanted fields and formatted JSON to match table, if it's not wanted I can revert to marmiton's example. Please edit freely as well !

EDIT : Removed comments on the empty fields (which are not empty anymore :stuck_out_tongue: ). I agree with all other comments !

PS : what a table, you did great ! :muscle: :nerd_face:

Djyp commented 4 years ago

Phew, just finished editing the table :sweat:

Djyp commented 4 years ago

I don't understand suitableForDiet, is it a data like : ['vegan', 'gluten free'] ?

Yes !

What is inside Properties from Recipe ?

Look at https://schema.org/Recipe, the Recipe type is like a class. Properties from Recipe are properties specific to a Recipe, then there are inherited properties for HowTo, CreativeWork and Thing. Of course it's just for the schema.org specification, our Recipe class will hold all the chosen properties

I would include name, author, description, image from the long list at the bottom

Sure ! I wasn't done yet ;)

estimatedCost could be hard to set depending on regions, product origins, currency, etc. Is it compulsory ? I'm not sure.

I guess we could take that off indeed. It would always be subjective.

What about all the empty rows ? You did not deal with it or it is not wanted ? Most of them seems not really useful.

I wasn't done, that why the first title contained «work in progress» 😉 The table was soooo long I had to complete later.

Some data types from marmiton.org seem strange

recipeYield is either QuantitativeValue or Text. What you see here is just the JSON-LD they chose to have on their website. We can persist whatever we want. I guess text is better so it's possible to store «4 parts», «One loaf», «A 40 cm wide pizza», «about 20 cookies»

For recipeIngredients I was thinking about having all the ingredients in a text, it would be way eaiser to add new recipes like that. Many people will copy-paste from other ressources. Then we parse the text so that each line becomes a single ingredient. We will be able to build the JSON like that. If we want to parse it all to get a shopping list we will find a way.

We don't have keywords in the table but it exists in marmiton's datas, is it needed ? I think it's for filter efficiency/complexity but we already have categories and cuisines and if we store ingredients like described above, it would be possible to filter by ingredient too.

I am a 100% sure the keywords and description are calculated from the title and the ingredients to help with SEO. We don't need those, do we ?

I have added a data dictionnary

:tada: yeah !!

Scttpr commented 4 years ago

Sorry did not get the WIP status (thought it was for the discussion :stuck_out_tongue: ) :pray:

Ok for all thoses answers, it's pretty clear for me and it looks great ! What a work ! I will update data dictionnary according to this. Will PR it for a better worflow.

Djyp commented 4 years ago

Here are the properties we intend to keep

Schema.org proposed property Format
cookTime Duration in ISO format
recipeCategory entrée, dessert
recipeCuisine type of cuisine (French, Mexican, …)
recipeIngredient schema.org says it's a single ingredient but it's a list of ingredients
recipeInstructions Text
recipeYield quantiy produced (number of servings, number of people)
suitableForDiet Vegan, Lactose-free, Gluten free
prepTime Duration to prepare the items do the instructions
tool An object used (but not consumed) when performing instructions or a direction. Could be useful to tell you need a oven or a piping bag
totalTime Total duration, calculated =prepTime+cookTime+waitTime
author of course ! ✔️
dateCreated ✔️
dateModified ✔️
datePublished ✔️
discussionUrl ✔️ Could be used but not persisted
inLanguage language in IETF BCP 47 Standard ✔️ of course !
isFamilyFriendly boolean :question: it's a recipe ! but what if… :thinking:
isPartOf ✔️ To indicate the recipe is part of a cookbook
description ✔️ Could bring context information "my grandma's most popular recipe", "typical dish in Kenya" and maybe comments from the author like "despire the instructions I always cook it 10 more minutes in my oven"
image ✔️ of course !
name ✔️ obviously yes !
url ✔️ just for JSON-LD

Here are the properties we need to talk about

Schema.org proposed property Format
abstract summarized description 🤷‍♂ maybe
estimatedCost Obvious ✔️
accessibilityAPI Indicates that the resource is compatible with the referenced accessibility API ❓ Everything related to acessibility should be discussed
accessibilityControl Identifies input methods that are sufficient to fully control the described resource
accessibilityFeature Content features of the resource, such as accessible media, alternatives and supported enhancements for accessibility
accessibilitySummary Summary about accessibility
aggregateRating The global rating 🤷‍♂ Not sure if we should rate recipes, people like a recipe or not, I think if anything useful we could guess popular recipes with how much they are liked (activitypub-wise), shared and maybe forked
comment 🤷‍♂ Why not, but regarding ActivityPub comments are activity
commentCount Only if we use comments
creativeWorkStatus Draft, Published, Obsolete
exampleOf ❓ could be used to tell what is work is forked from
isAccessibleForFree boolean ❓ let's use it in json-ld and always have on true
isBasedOn ❓ could be used to tell what is work is forked from
thumbnailUrl ❓ could be used to have a thumbnail url of the image url
video ❓ not sure
Djyp commented 4 years ago

I should have described all data types correctly.

It feels like all properties I wasn't sure of should just be put away

Djyp commented 4 years ago

oh ! And I wanted to add a waitTime property which doesn't exist in the Recipe schema. It has been discussed https://github.com/schemaorg/schemaorg/issues/2164 but never really approved despite a few positive reactions to the idea

Scttpr commented 4 years ago

I will update data types in data dictionnary directly and make a PR to let you validate it ! Don't need to do it here I think.

Scttpr commented 4 years ago

Other models would be User, Cookbook. I don't see many different models.

Djyp commented 4 years ago

Other models would be User, Cookbook. I don't see many different models.

I don't see anything else either.

Regarding the PR :

Could a recipe have several categories ?

Sure ! Something could be a dessert or part of a breakfast (I deleted the question in my commit)

Could a recipe have several cuisines ?

I think not, if a dish as more than one origin it's probably a fusion dish like mexican-japanese food. (I deleted the question in my commit)

Is it raw ingredient or is it computed one with qty and unit ?

I'm not sure how to store the data. So for an easy way to add ingredients, I think the end-user should just have a textarea where he lists the ingredients with the quantity. One line, one ingredient. Then, I think we should analyze the list to extract each ingredients so we can compute a shopping list or calculate the quantities to double a recipe, or make it to yield 6 portions instead of 4. (question not deleted, tell me what you think)

Same for the recipeInstructions, I think we should have a single textarea than we split it in array of strings.

(recipeYield) Does it needs a related variable unit ?

No, not a fiex one. Let's have something where we are free to type "1 loaf", "300g of salad", "12 servings" or "4 personnes". The data should be language blind. If at any moment we need only the number we can have a special getter to extract it from the string. (I deleted the question in my commit)

(language)

I change the exemple from fr to fr-fr. We should always have the country because not all french countries have the same ways to specify quantities even if they have the same language.
In Europe, for example, french and british people use weight for many ingredients (which is more acurate but that's not the point here). In US and Canada, people use volumes for those ingredients (like flour or sugar). So statian recipes would say 2 cups flour and brit recipes would say 440g flour.
On top of that, not all countries use the same word for the same ingredients. A zuccini is the french and english word in north america for the french courgette.

(isFamilyFriendly)

I added a description in the commit

(isPartOf)

Not sure why I put it in the list like it was obvious. Indeed we need to talk about it. So the goal of this property is to link recipes. Like a specific sauce would be a part of a certain indian recipe. I think this could be really useful. But maybe people will have difficulties to understand it :shrug:
Maybe it could be added later ?

The commit

Scttpr commented 4 years ago

Agreed with everything. isPartOf was indeed pretty obvious, my bad :yum:

I'm not sure how to store the data. So for an easy way to add ingredients, I think the end-user should just have a textarea where he lists the ingredients with the quantity. One line, one ingredient. Then, I think we should analyze the list to extract each ingredients so we can compute a shopping list or calculate the quantities to double a recipe, or make it to yield 6 portions instead of 4. (question not deleted, tell me what you think)

Same for the recipeInstructions, I think we should have a single textarea than we split it in array of strings.

It is indeed a good idea to keep it simple on client side with the textarea. I think it is also a good idea to compute the data from the string to extract and classify it. It leads to automatic shopping list, sharper search possibilities, etc. I would say that we could compute and store it to one yield (no matter is the unit chosen for it) in order to scale easily from one to any yield. It could look like :

const contentFromTextarea = ''1 cup of sugar, 200g of butter, ...";
const yields = 2;

// Default computation is for one yield
const storedData = [
  {
    ingredient: 'sugar',
    quantity: 0.5,
    unit: 'cup',
  },
  {
    ingredient: 'butter',
    quantity: 100,
    unit: 'gram',
  },
];

I think it could be awesome but it's quite a big computation to do because user input could be a huge mess :stuck_out_tongue:

I commited on current PR just to fix the table and close following questions :

The main remaining topic is on those instructions and ingredients and I guess we are pretty good.

I will add a commit for user and cookbook during lunch time if I manage to fix my work early on this morning :stuck_out_tongue: and will add discussion here or in an other issue, I don't know what is best practice in this kind of situation :smiley:

Scttpr commented 4 years ago

I commited a first proposal for Cookbook and User models.

Some points we need to discuss :