Establishing Transparency and Fairness Guidelines for Feature Visibility

jcscottiii commented 6 months ago

Description

As webstatus.dev grows to encompass a wide range of browser feature implementations, situations may arise where we need to temporarily or permanently hide feature scores or features altogether. This could be due to a variety of reasons, such as:

Data anomalies: Potential errors or inconsistencies in gathered data (such as features that are actually implemented but have no Web Platform Test coverage or the coverage implies the feature is not implemented but in reality the Web Platform Test suite needs to improve the tests).
Ongoing development: Features in an early stage where scores aren't representative (such as Web Platform Test scores showing 100% for a feature that has not been fully implemented yet).

While these scenarios are understandable, it's crucial to handle them in a way that maintains trust, transparency, and fairness amongst browser vendors while also describing the actual state of the world to end-users. We want to invite open discussion on how to best achieve this.

Desired Goals

Equitable Process: Establish clear, unbiased criteria for when and how to hide information.
Transparency: Document all decisions regarding feature visibility, along with the rationale.
Accountability: Create a mechanism for the community to raise concerns and suggest changes.
User Information: Provide clear explanations for end-users when information is hidden, including links back to the relevant discussion.

Possible Solutions (Not Exhaustive)

Test Suite Review Process: Review of test suite to ensure reasonable coverage and that failures are explainable.
Public Comment Period: Allow a timeframe for feedback before any information is hidden.
"Hidden Score" Label: Add a visual indicator to hidden items, with a link to the rationale.
GitHub Discussions: Utilize Discussions to host conversations around feature visibility concerns.

These are just starting points. We encourage everyone to share their ideas, concerns, and suggestions to ensure we create a process that upholds the values of this project.

Call to Action

Please feel free to comment on this issue with your thoughts. Your input is invaluable in shaping the future of this project and ensuring it is a trusted resource for everyone. Let's work together to build a truly transparent and equitable process!

Please voice your concerns as well, while adhering to the project's Code of Conduct.

This process can evolve over time as well, as we try things out.

foolip commented 6 months ago

Feedback from @meyerweb on Mastodon: https://mastodon.social/@Meyerweb/112457440224134542

foolip commented 6 months ago

On "data anomalies", I think we'll need clear criteria for hiding scores, a few different common rationales, and perhaps links out to issues tracking fixing it.

Common rationales are insufficient coverage and widespread failures for reasons unrelated to the implementation quality.

meyerweb commented 6 months ago

As a followup on @foolip’s link to my toot (thanks, @foolip!) I think as long as rationales for absent scores are clear and consistent, you’ll be a lot further along.

I also believe there should be a lot more transparency on why a thing is listed at all when the supporting data doesn’t seem to be there. Example: https://webstatus.dev/features/canvas-text-baselines is listed as a newly-available baseline even though one of the tracked browsers is passing 0% of tests. (A whole three tests, it is true.) How can this be considered baseline when it’s apparently not supported at all by a tracked browser? I mean, I can think of at least one scenario where that sort of thing might be defensible, but I have no idea if this is such a scenario, nor does anyone else.

Even beyond that, https://webstatus.dev/features/hyphens is listed as baseline when its scores are mostly in the 50s, and the highest score is just short of 75%. It also has 55 tests, of which only 20 are passed by all four tracked browsers, which is a 36.4% Interop score. Does that qualify as baseline? I personally wouldn’t think so, but if there were a list of the ways things can get on the list, that would help a lot.

And then, I found https://webstatus.dev/features/conic-gradients, which is “Widely available” baseline, with one browser passing 18% of tests? And then https://webstatus.dev/features/webvtt, which ranges from 37-56% in terms of passing tests, and would have an Interop score of 9.1%? These also seem strange to include.

(I know that scores aren’t always the basis of something being considered baseline, but because the scores are so prominent, the questions seem inevitable. This is especially the case since “Insufficient test coverage” is given as a reason to not list scores, even if inconsistently.)

foolip commented 6 months ago

As a start, I've added source comments explaining each case in https://github.com/GoogleChrome/webstatus.dev/pull/301. We used the same reason for all of these for expedience, but we should make the distinction between a few different reasons:

Obviously insufficient coverage, like for AVIF
Widespread failures that we know to be for some reason other than the feature's implementation quality, like for device orientation events
Failures that aren't understood, but seem unlikely to be a reflection of the implementation quality based on some out-of-band knowledge. For example, I'm fairly confident that preservesPitch works well enough in the majority of use cases web developers care about, so 22.2% Firefox and 0% for Safari would be unreasonable.

@jcscottiii what do you think about always showing the ⓘ when we don't have a percentage to show, and to have more reasons? The existing "---" should be "no tests found" with an invitation to contribute to the mapping.

Reviewing the specific features @meyerweb mentioned:

https://webstatus.dev/features/canvas-text-baselines: I reviewed the Safari failures and guessed that since it was implemented in Safari so long before other browsers, that the spec probably changed in some way and Safari's implementation doesn't match the current spec. But this needs to be verified, I've filed https://github.com/web-platform-dx/web-features/issues/1120.

https://webstatus.dev/features/hyphens: We'll need a subject matter expert to review this test suite. It's hard to tell if the failures are for cases that will affect web developers or not. My guess is that basic usage of the feature is fine, but that interoperability in the details isn't very good.

https://webstatus.dev/features/conic-gradients: This was on oversight on my part. The failures mostly look like minor pixel value differences. If we can't fix the tests we should hide this score for Safari specifically.

https://webstatus.dev/features/webvtt: I think WebVTT interop is somewhat bad, but you can use the basic feature. However, I see that I need to update this mapping to split WebVTT from WebVTT regions, since that contributes to the low score in at least Chrome and Edge.

foolip commented 6 months ago

For WebVTT I've filed https://github.com/web-platform-tests/wpt/issues/46453 and sent https://github.com/GoogleChrome/webstatus.dev/pull/314 to hide the scores on webstatus.dev. This fits the "Widespread failures that we know to be for some reason other than the feature's implementation quality" reason I think.

foolip commented 6 months ago

Thinking about some guardrails for support status vs. test results:

Show no scores when a feature isn't supported (already the case)
For supported features, automatically hide scores <50% until reviewed, because most such cases will be a problem with the test suite or infrastructure more than the implementation quality
If a score is changed more than 10% by an issue other than implementation quality, hide the score for that specific browser

This would be the general approach, but exceptions could still be made based on other documented principles.

dmitriid commented 3 months ago

For sake of transparency features whose status is "not on any standards track" should be shown as such instead of "limited availability"

jcscottiii commented 3 months ago

For sake of transparency features whose status is "not on any standards track" should be shown as such instead of "limited availability"

@dmitriid

That's a great idea. And it would provide better insights than the current solution.

Looking at your comment on the related issue, we can leverage the status field from caniuse and check if it is unoff.

dmitriid commented 3 months ago

Yeah, I didn't realize there was a related issue, so ended up commenting (rather tersly 😬) on both.

I don't know how complete/up-to-date the data is, but it's probably okay if Can I Use ended up using it :)

past commented 3 months ago

I don't see why we would conflate "not on any standards track" and "limited availability" given they are orthogonal issues. I agree that the first part should be captured somehow though, which we should explore in #486.

GoogleChrome / webstatus.dev