FirelyTeam / firely-validator-api

Firely's official FHIR validator API for validating HL7 FHIR resources against profiles.
Other
7 stars 2 forks source link

API Proposal: Add Async validation #329

Closed almostchristian closed 1 week ago

almostchristian commented 2 months ago

The new validator should include an async API to avoid calls to TaskHelper.Await. During my performance profiling sessions, I found using TaskHelper.Await, or even Task.GetAwater().GetResult() makes the traces chaotic. Also, I feel it has a significant scalability impact when used with network bound resource resolvers.

Proposed new API

The three interfaces use the default interface method implementation for the new API. If the validator does not need to be async (i.e., it doesn't call any async api such as other other validators), it doesn't need to implement the new async method.

public class Validator
{
   Task<OperationOutcome> ValidateAsync(ElementNode instance, string? profile, CancellationToken cancellationToken);

   Task<OperationOutcome> ValidateAsync(Resource instance, string? profile, CancellationToken cancellationToken);
}

public interface IElementSchemaResolver
{
   ValueTask<ElementSchema?> GetSchemaAsync(Canonical schemaUri)
      => new(GetSchema(schemaUri));
}

public interface IValidatable
{
   ValueTask<ResultReport?> ValidateAsync(IScopedNode input, ValidationSettings vc, ValidationState state, CancellationToken cancellationToken)
      => new(Validate(input, vc, state));
}

public interface IGroupValidatable
{
   ValueTask<ResultReport?> ValidateAsync(IEnumerable<IScopedNode> input, ValidationSettings vc, ValidationState state, CancellationToken cancellationToken)
      => new(Validate(input, vc, state));
}

Performance

The async validate API also appears to be faster according to benchmarks:

Method Resource Mean Error StdDev Ratio
LegacyValidator Hippocrates.practitioner 3,438.0 μs 67.57 μs 109.11 μs 1.00
NewValidator Hippocrates.practitioner 464.2 μs 8.45 μs 16.48 μs 0.14
NewValidatorAsync Hippocrates.practitioner 170.4 μs 3.34 μs 5.58 μs 0.05
LegacyValidator Levin.patient 47,261.2 μs 931.72 μs 1,035.60 μs 1.00
NewValidator Levin.patient 1,009.2 μs 20.03 μs 47.60 μs 0.02
NewValidatorAsync Levin.patient 733.7 μs 13.67 μs 24.99 μs 0.02
LegacyValidator MainBundle.bundle 71,836.7 μs 1,367.22 μs 1,342.79 μs 1.00
NewValidator MainBundle.bundle 5,487.6 μs 116.04 μs 340.33 μs 0.08
NewValidatorAsync MainBundle.bundle 5,212.4 μs 122.50 μs 361.20 μs 0.07
LegacyValidator patient-clinicalTrial.xml 37,239.2 μs 673.64 μs 597.17 μs 1.00
NewValidator patient-clinicalTrial.xml 3,630.5 μs 71.53 μs 147.71 μs 0.10
NewValidatorAsync patient-clinicalTrial.xml 2,726.8 μs 54.42 μs 113.60 μs 0.07
LegacyValidator Weight.observation 55,374.9 μs 1,086.31 μs 1,958.84 μs 1.00
NewValidator Weight.observation 2,121.3 μs 71.24 μs 210.06 μs 0.04
NewValidatorAsync Weight.observation 1,707.1 μs 33.73 μs 85.87 μs 0.03
ewoutkramer commented 2 months ago

This is interesting, as we initially had an async interface, but due to the fact that many validations (cardinality, string length, patterns, etc) were merely very short computes, the overhead of Task/await on all those nested validates became noticable, so we removed it. I guess it depends on the ratio of I/O bound validations and simple, compute validations. And as we know, async tends to spread everywhere, so we cannot have non-async validates for simple computes, and async validates for i/o bound ones. At least I think so, I'd love to be proven wrong ;-) As it stands, I don't feel like re-introducing async anytime soon after our previous experiences.

almostchristian commented 2 months ago

In your previous testing, did you use ValueTask or Task for the validators? I agree that superfluous use of async can hurt performance and it may take testing many use cases to definitively say which one is better for performance (and in the end the answer may be 'it depends'). I think the validator should support both synchronous and asynchronous validation, and it will be up to the end users to validate which is the best one to use for their use case.

For now, the main advantage to using async is that it preserves the traces in flame graphs which makes it easier to diagnose performance issues. In the synchronous version, the occasional calls to TaskHeper.Await changes the stack traces

Flame graph (synchronous) image

Flame graph (async) Screenshot 2024-07-02 125520

brianpos commented 2 months ago

My QR validator has a hybrid of both approaches. https://github.com/brianpos/fhir-net-web-api/blob/66e90f72c6adcdf36f436a313e8e886fa61abf9e/src/Hl7.Fhir.StructuredDataCapture/QuestionnaireResponseValidator.cs#L742 It has an async top level, with a List to track any async requestst that are made, which it then waits on before returning. So anything that has an async result is added to the list and everything else just continues synchronously. All issues are added into the OutcomeIssues list too, which is what is returned at the end.

almostchristian commented 1 month ago

One benefit to an async validtion API is that potentially, validation could take a long time and it is possible that the client has cancelled the validation request. Respecting the cancellation token can help with scalability.

mmsmits commented 1 week ago

We think async will lead to too much overhead current, since most validation are so fast that starting up a task will take more time than actually validating the validation blocks.