GoogleChrome / lighthouse

Automated auditing, performance metrics, and best practices for the web.
https://developer.chrome.com/docs/lighthouse/overview/
Apache License 2.0
28.2k stars 9.34k forks source link

[Seo Audits] Structured Data should fetch "all-layers" of schema #8151

Closed mmocny closed 2 years ago

mmocny commented 5 years ago

Feature request summary

PR #6750 added support for simple `schema.org` validation checking. One of the steps required before validation is to download the schema definition (and process it). The current schema definition downloaded is the [schema.org core schema](https://schema.org/version/latest/schema.jsonld), as per [this line](https://github.com/GoogleChrome/lighthouse/blob/master/lighthouse-core/lib/sd-validation/scripts/generate-schema-tree.js#L16). In discussion with @danbri, he suggested in may be more appropriate to download "all-layers" schema definition. As per schema.org [developer guide](https://schema.org/docs/developers.html), the right URL would be `https://schema.org/version/latest/all-layers.jsonld`. @danbri: you also mentioned that core schema is moving towards just including all-layers in it, so I'm not sure if this is moot at this point? (I am willing to make this simple fix, if it is appropriate) cc: @patrickhulce @brendankenny @rviscomi
patrickhulce commented 5 years ago

Looking at the file, it's quite a bit larger (~1.2MB, almost the size of all of Lighthouse at the moment) and not something I think we could bundle into the default distribution. We've already surpassed our limit of what we can bundle with the current setup, and we'll likely have to trim some of the lesser used categories by default before we can include its results.

Do you think structured data might be a broad enough category that deserves its own Lighthouse plugin @mmocny? The flexibility to add what you need would be much greater in a plugin setting and we can still support these efforts by making sure the artifacts provide what you need (like #8133).

Would a tag team approach with the default audit providing ~90% usage-based coverage and a plugin providing complete coverage be a reasonable compromise for this sort of thing?

mmocny commented 5 years ago

That certainly sounds very reasonable to me.

My personal desire for this feature is specifically to support auditing an entirely optional externally hosted schema extension -- so I already plan to need to write a plugin/custom audit (I don't yet understand the nuanced difference between those) -- even if lighthouse was willing to fetch the larger schema file by default. I expect this will be common for other folks as well, as there are several other popular vocabs.

That said, I'd wait to hear from @danbri on the nuances between core and all-layers schema.

brendankenny commented 5 years ago

so I already plan to need to write a plugin/custom audit (I don't yet understand the nuanced difference between those

for this side conversation: custom audits do the interesting work and can do anything that core audits do, while a plugin is just a node module containing the audit(s) and a config file that tells Lighthouse to run them.

We're still working on docs, hoping to land in the next few weeks, but we have an example here: https://github.com/GoogleChrome/lighthouse/tree/master/docs/recipes/lighthouse-plugin-example

If anything about it could be clearer, please let us know and we can update :)

mmocny commented 5 years ago

Oh, thats really neat -- so the hope is that there could be fewer, but more flexible audits, and by tweaking config params, plugins can still adjust behaviour subtly?

For structured, perhaps you would just expose a "list of URLs to fetch schema contexts", and then all I need is a plugin to provide my contexts? The actual validation code could certainly be shared.

(Indeed those plugin docs are still sparse, but I'll follow along the sd audit work, and usee the plugin sample to try and build a plugin for my needs. Thanks for the help!)

brendankenny commented 2 years ago

I'm going to close as a bit stale :)

Our structured data approach has changed quite a bit since this issue, deferring more to plugins/custom audits to do the processing rather than having it in lighthouse core. If anyone is following along and is still waiting on this functionality, please comment or open a new issue so we can discuss!

danbri commented 2 years ago

We have some work in this area brewing, building on https://github.com/google/schemarama and efforts to express much of Google's SD validation using W3C SHACL and ShEx.

I will open a fresh issue for it

On Tue, 11 Jan 2022 at 20:46, Brendan Kenny @.***> wrote:

Closed #8151 https://github.com/GoogleChrome/lighthouse/issues/8151.

— Reply to this email directly, view it on GitHub https://github.com/GoogleChrome/lighthouse/issues/8151#event-5876159689, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABJSGKCYGCC2RMY4X4TEYLUVSJKJANCNFSM4HE7TBHQ . You are receiving this because you were mentioned.Message ID: @.***>