Open ferrisoxide opened 1 year ago
Checking unique keys in the products.properties
field:
select distinct(json_data.key)
from products, jsonb_each(products.properties) as json_data
returns
potassium
weight_g
fiber
carbohydrate
sugars
unit_count
size
author
monounsaturated_fat
volume_ml
ingredients
polyunsaturated_fat
fat_calories
alcohol_by_volume
calories
weight_ounce
servings_per_container
trans_fat
format
saturated_fat
pages
fat
volume_fluid_ounce
sodium
serving_size
protein
publisher
cholesterol
There are only a small number of keys to worry about - and most will map to schema.org types. I think I'll go ahead and start introducing JSON-LD to the API.
Per discussions on Schema.org, repeated values are fine. This is valid Product data:
{
"@context": "https://schema.org",
"@type": "Product",
"additionalProperty": [
{
"@type": "PropertyValue",
"name": "myCustomProperty",
"value": "my custom value"
},
{
"@type": "PropertyValue",
"name": "myOtherCustomProperty",
"value": "my other custom value"
}
],
...
so we can use an array additionalProperty
to record anything that doesn't fit elsewhere.
NB Can use https://validator.schema.org/ to validate data
Might be also worth getting my head around SHACL
Also, JSON-LD Best Practices
The lack of any formal structure in the data has always bothered me, as we basically store everything in one blob of JSON data. In the general case this is fine, but when products/items have very definite properties (e.g. a
Book
has anauthor
, food items havecalories
, etc) it gets harder to ensure that the data makes sense - or is consumable in a repeatable way.Proposal
Brocade.io to start using JSON-LD as the base model for all data presented via the API, using schemas published by third-parties like https://schema.org.
For instance, if we adopt the "Product" type from schema.org, we can structure product information that is both human-readable and easily processed by applications:
JSON-LD also enables us to add a graph for extended attributes, e.g. if nutritional information is available for a product we can use the NutritionInformation type to present these attributes in a structured manner:
Other types from schema.org can be used as applicable (e.g. Book, Movie), etc. We can also make use of schemas published by other third parties - or our own custom types - as required and potentially "future proof" the underlying data model.
We can also use the type information in the frontend, using schema types to determine the best way to present data like nutritional information in a table-like format (see #11). We can also use to insert Microdata into the HTML to nest metadata suitable for search engines, web scrapers and the like to consume.
Benefits
Risks / Possible Problems
We can mitigate the second problem by processing individual products on demand and progressively update data. The problem of parsing data requires a bit more thought and investigation, but it looks like a solvable problem.