fhir-crucible / crucible

🔥 Crucible web application for testing FHIR servers
77 stars 25 forks source link

Legacy Crucible Tests prevent 100% conformance and success #189

Open Interopguy opened 8 years ago

Interopguy commented 8 years ago

As a long standing Crucible user, our FHIR (WildFHIR) implementation has undergone a lot of development and retesting. During this time, the Crucible project has equally undergone some change, updates and revisions.

When the Crucible Team deprecates legacy test case(s), no matter the reason. The Crucible project starburst, conformance history and conformance percentage being reported going forward include reference to those now none existent test cases. Simple navigation will show those test, while we attempt to navigate to the test themselves and there is no longer any test definition.

The result of this issue is that many long supported test systems will be prevented from ever getting to 100% success or 100% conformance. Where new systems or systems with extremely limited conformance capability can quickly get to 100%. This could mislead the viewer into believing the system with 100% is higher quality or a more capable implementation than a longer supported test system which has 95%.

Request: If the Crucible Project team deprecates any test cases (or historically any test cases previously deprecated), future conformance testing should immediately remove those deprecated tests from counting forward towards the conformance percentage.

Current work-around (not a favored solution) - delete all test history for the test system and start fresh. This is a destructive operation, and should be a means of last resort. As test history is valuable information to the standards community and the test system owners.

jawalonoski commented 8 years ago

Thanks for the issue report... we've been thinking about this... it also relates to systems having 100% when they've only ran a few tests, where if they ran all the tests, they might end up at 23% (for example). One thought we had was to move away from a "percentage" to a "score" so we could weigh all these factors.

Whatever the solution, we do agree we need to do something about this.... more soon.

lawley commented 7 years ago

When you're reviewing this, please take into account specialised system such as a specific terminology-only server that would only ever claim to implement a small number of resource types and associated operations.

Perhaps it would be worth defining several common profiles and reporting scores relative to these profiles. Thus you could say something like system X implements x% of the Terminology Services profile and passes y% of the tests that match that profile.

jawalonoski commented 7 years ago

Well, today you can see whether or not a server passes all the terminology related portions of the spec, but it does take some navigating. We have also considered moving towards a "badge" system... have a badge for terminology services, another for provider directory, etc. Not there yet....