Closed u8sand closed 6 years ago
This is surprisingly pretty easy thanks to https://github.com/scrapinghub/extruct and https://github.com/adriank/ObjectPath
Essentially we just obtain the json-ld and then search it with well-crafted objectpath. Ignoring the json-ld fetching which is just a line, here is how we can convert a metric "json-path" attribute into a validation mechanism:
def get_json_ld_attr(tree, attr):
attrs = attr.split('.')
return tree.execute(
'$..*[@.@context is "http://schema.org"]..*[@.@type is {attr_type}].{attr_val}'.format(
attr_type=attrs[0],
attr_val='.'.join(attrs[1:]),
)
)
e.g.
# Dataset can be downloaded
get_json_ld_attr(tree, 'Dataset.url')
# Website has license info
get_json_ld_attr(tree, 'WebSite.license')
# Dataset has citation info
get_json_ld_attr(tree, 'Dataset.citation')
harmonizome.cloud
(json-ld
was recently added)