Closed rufuspollock closed 3 months ago
I think the value of this is very high and i suspect doing is very low - we just need to add some fields to the html <head>
Enhancing dataset page indexing in Google Search is crucial for improving visibility and accessibility of our content.
Currently, our dataset pages lack structured data fields required for optimal indexing according to schema.org standards.
Implement structured data fields using JSON-LD to provide search engines with detailed metadata about our datasets.
Implementation of JSON-LD structured data should be completed within 2-3 days, including testing and adjustments.
Avoid implementing incomplete or incorrect JSON-LD structures that could potentially harm search engine indexing.
Example JSON-LD script and suggestions for testing on specific dataset pages like Air Pollution Collection. Regular monitoring through Google Search Console recommended for evaluating effectiveness.
@gradedSystem
Can you create a draft of a JSON-LD that would specify exactly which fields we'd include, and from which part of the Data Package they would come from. Something like:
{
...
name: datapackage.title,
description: datapackage.description,
license : datapackage.licences[0],
...
}
This may also be helpful when it comes to implementation: https://nextjs.org/docs/app/building-your-application/optimizing/metadata#json-ld
Here is the JSON-LD format that I tried to incorparate everything from the metadate that is available here: https://specs.frictionlessdata.io/data-package/#metadata
<script type="application/ld+json">
{
"@context": "https://schema.org/",
"@type": "Dataset",
"description": "datapackage.description",
"name": "datapackage.name",
"title": "datapackage.title",
"url": "datapackage.homepage",
"identifier": [
"datapackage.id[0]",
"datapackage.id[1]",
...
],
"isAccessibleForFree": true,
"license": [
{
"@type": "datapackage.licenses[0].title",
"name": "datapackage.licenses[0].name",
"url": "datapackage.licenses[0].path"
},
{
"@type": "datapackage.licenses[1].title",
"name": "datapackage.licenses[1].name",
"url": "datapackage.licenses[1].path"
},
...
],
"creator": [
{
"@type": "datapackage.contributors[0].organization",
"url": "datapackage.contributors[0].path",
"name": "datapackage.contributors[0].title",
"contactPoint": {
"@type": "ContactPoint",
"email": "datapackage.contributors[0].email"
}
},
{
"@type": "datapackage.contributors[1].organization",
"url": "datapackage.contributors[1].path",
"name": "datapackage.contributors[1].title",
"contactPoint": {
"@type": "ContactPoint",
"email": "datapackage.contributors[1].email"
}
},
...
],
"isPartOf": [
"datapackage.sources[0].path",
"datapackage.sources[1].path",
...
],
"dateCreated": "datapackage.created",
"dateModified": "datapackage.updated",
"citation": "datapackage.id",
"version": "datapackage.version"
}
</script>
cc @olayway
Only one question I have is if we can also use other fields listed in schema.org can be used. I'll try to find out. But I think we're good to go.
@gradedSystem what's the status of this?
The script is being successfully added to the HTML:
But when testing any of our core sites URLs it seems they can't even be accessed:
This is because our dataset pages still return 500 initially. Old issue that we thought was fixed (or rather for which we found a workaround): https://github.com/datopian/datahub-next/issues/275
FIXED and will open a new one for 500 errors
Add some special fields to DataHub dataset pages so they get indexed better by google.
See here for instructions https://developers.google.com/search/docs/appearance/structured-data/dataset
Should be pretty simple to do from the metadata we already have for datasets ...