fediverse-devnet / feditest-tests-fediverse

The tests for the fediverse testsuite
MIT License
5 stars 4 forks source link

Hard vs Soft Failures #99

Closed steve-bate closed 15 hours ago

steve-bate commented 2 weeks ago

Related to #59

It's not clear to me how we are categorizing "hard" vs "soft" test failures.

One possible approach:

RFC Conformant Significant Interop Risk Hard/Soft
Y N N/A
N N soft
N Y hard
Y Y (maybe doesn't exist)

Because of the Postel philosophy, being nonconformant with several WebFinger RFC requirements (even some MUST requirements) does not represent a significant interop risk. These include content type (accepting application/json) and HTTP status codes (accepting non-retryable 4xx status codes for client query errors) requirements.

In general, my opinion is that we should default to soft failures unless we can support a claim that any RFC nonconformance is a significant fediverse interop risk.


EDIT: It's sometimes unclear whether to put these types of issues in the base feditest repo or this one. I'm not sure if the issue overlaps with both repos. There's a similar issue in feditest (https://github.com/fediverse-devnet/feditest/issues/160).

jernst commented 1 week ago

Can you make a list of things that you'd like to reclassify? Maybe from the sequential test report might be easiest.

steve-bate commented 1 week ago

In general, I think any failure currently classified as "hard" that doesn't break interop should be reclassified as "soft".

This includes most (or all) HTTP status code mismatches.

The application/json content type failure (I mentioned above) is another one that I believe should be classified as "soft".

Note that for either of these, the current tests may classify them as hard fails in some tests and soft fails in other tests. I'd like to see a consistent classification if that's the case (or a clear rationale for the apparent inconsistency).

steve-bate commented 1 week ago

Here's an example of what seems to be an inconsistency. One "not 400" failure is a hard failure and the other is a soft failure. It's possible that in some cases this is accurate, but it doesn't seem to be here. (Note that you may not have "bonfire" in your node set, but you can easily find other examples of this in the test results) image