FirelyTeam / firely-net-sdk

The official Firely .NET SDK for HL7 FHIR
Other
824 stars 342 forks source link

Very slow performance for Deserialization of a large Bundle #2800

Closed KLazarov closed 3 months ago

KLazarov commented 3 months ago

Describe the bug When I deserialize a large (1-4MB) Bundle resource the performance goes down 10-20x times. A 4MB Bundle can take up to 400ms to deserialize.

If I instead used dynamic object instead of Bundle, the performance is closer to 11ms for the same 4MB Bundle.

To Reproduce Steps to reproduce the behavior:

  1. Load a 1+ MB json file, for example the json definitions as they are pretty big Fhir R4 definitions. I used the valuesets.json which is around 9.3 million characters.
  2. Deserialize it multiple times using Bundle -> get 400ms or more per object
  3. Deserialize it multiple times using object -> get around 11ms per object

Expected behavior The deserializer should not lose 95%+ of it's performance just because I have set the type.

Code

The code also prewarms the Bundle deserialization as otherwise it can go up to 1.5 seconds. All tests were done on a AMD 5900x CPU, 64GB 3600MHz RAM, and Windows 11. The file is loaded only once in memory.

using System.Text.Json;
using Hl7.Fhir.Model;
using Hl7.Fhir.Serialization;

class Program
{
    public static void FhirR4SerializerTest(int runs, string filename)
    {
        Console.WriteLine("Running C# FHIR SDK benchmark!");

        string source = File.ReadAllText(filename);
        var parser = new FhirJsonParser();
        for (int i = 0; i < 20; i++)
        {
            Bundle bundle = parser.Parse<Bundle>(source);
        }

        DateTime dateTimeStart = DateTime.Now;

        for (int i = 0; i < runs; i++)
        {
            Bundle bundle = parser.Parse<Bundle>(source);
        }
        DateTime dateTimeFinish = DateTime.Now;
        var totalTimeTake = dateTimeFinish - dateTimeStart;
        Console.WriteLine($"Fhir SDK miliseconds taken: {totalTimeTake.TotalMilliseconds}");
    }

    public static void FhirR4SerializerTestv2(int runs, string filename)
    {
        Console.WriteLine("Running C# FHIR SDK system.text benchmark!");

        string source = File.ReadAllText(filename);
        var parser = new FhirJsonParser();
        var options = new JsonSerializerOptions().ForFhir(ModelInfo.ModelInspector);
        for (int i = 0; i < 20; i++)
        {
            Bundle bundle = JsonSerializer.Deserialize<Bundle>(source, options);
        }

        DateTime dateTimeStart = DateTime.Now;

        for (int i = 0; i < runs; i++)
        {
            Bundle bundle = JsonSerializer.Deserialize<Bundle>(source, options);
        }
        DateTime dateTimeFinish = DateTime.Now;
        var totalTimeTake = dateTimeFinish - dateTimeStart;
        Console.WriteLine($"Fhir SDK system.text miliseconds taken: {totalTimeTake.TotalMilliseconds}");
    }

    public static void FhirObjectSerializerTest(int runs, string filename)
    {
        Console.WriteLine("Running C# FHIR object benchmark!");

        string source = File.ReadAllText(filename);
        for (int i = 0; i < 20; i++)
        {
            object bundle = JsonSerializer.Deserialize<object>(source);
        }

        DateTime dateTimeStart = DateTime.Now;

        for (int i = 0; i < runs; i++)
        {
            object bundle = JsonSerializer.Deserialize<object>(source);
        }
        DateTime dateTimeFinish = DateTime.Now;
        var totalTimeTake = dateTimeFinish - dateTimeStart;
        Console.WriteLine($"Fhir object miliseconds taken: {totalTimeTake.TotalMilliseconds}");
    }
    static void Main(string[] args)
    {
        Console.WriteLine("Running C# benchmark!");

        int runs = 100;
        string filename = "bundle_file.json"; // Replace with the name of your file

        FhirR4SerializerTest(runs, filename);
        FhirObjectSerializerTest(runs, filename);
        FhirR4SerializerTestv2(runs, filename);
    }
}

Results

With custom Bundle that we use at work:

PS C:\Users\user\net-test> dotnet run -c Release
Running C# benchmark!
Running C# FHIR SDK benchmark!
Fhir SDK miliseconds taken: 45401,7309
Running C# FHIR object benchmark!
Fhir object miliseconds taken: 1280,7618
Running C# FHIR SDK system.text benchmark!
Fhir SDK system.text miliseconds taken: 28314,5444

With valuesets.json:

PS C:\Users\user\net-test> dotnet run -c Release
Running C# benchmark!
Running C# FHIR SDK benchmark!
Fhir SDK miliseconds taken: 38795,9362
Running C# FHIR object benchmark!
Fhir object miliseconds taken: 2475,0446
Running C# FHIR SDK system.text benchmark!
Fhir SDK system.text miliseconds taken: 29868,7799

Version used:

mmsmits commented 3 months ago

@KLazarov

Thanks for reaching out. The tests you've attached show indeed a big difference in performance, however FhirObjectSerializerTest doesn't only differ in the fact that you've set a type compared to FhirR4SerializerTestv2.

In the FhirObjectSerializerTest you do not use options (new JsonSerializerOptions().ForFhir(ModelInfo.ModelInspector)), which contains all the FHIR logic of the Deserializer (all the parsing from string to DateTime, Int, Bool, etc. But also all the FHIR extension logic, validation, and other checks. This takes time. So it's not a fair comparison.

The way you test now, you just get a dictionary returned that corresponds to the Json file, it doesn't parse to our FHIR Poco models.

You can still improve the time of deserialization by turning off the validator in the settings (set it to null) if you want, but it will never come close to parsing json to a dictionary without any FHIR logic.

If you have concrete ideas on how to speed up serializer further, please let us know.

mmsmits commented 3 months ago

Maybe we can utilize System.Text.Json more in serializers to improve the speed. However, this currently doesn't have our top priority.

If you would like to discuss this, re-open the issue. We can do some design discussion and you can create a PR afterwards.