dotnet / Open-XML-SDK

Open XML SDK by Microsoft
https://www.nuget.org/packages/DocumentFormat.OpenXml/
MIT License
4.02k stars 546 forks source link

I am using the OpenXML SDK to read the PPT template for content replacement. After stress testing, I found that the memory usage is too high and the number of Gen0, Gen1, and Gen2 object collections is too frequent. #1765

Closed snake-L closed 1 month ago

snake-L commented 2 months ago

BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3880/23H2/2023Update/SunValley3) 11th Gen Intel Core i5-11400H 2.70GHz, 1 CPU, 12 logical and 6 physical cores .NET SDK 9.0.100-preview.3.24204.13 [Host] : .NET 8.0.7 (8.0.724.31311), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI .NET 8.0 : .NET 8.0.7 (8.0.724.31311), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Job=.NET 8.0 Runtime=.NET 8.0

Method Mean Error StdDev Gen0 Gen1 Allocated
xxx 2.293 s 0.0110 s 0.0103 s 3000.0000 1000.0000 22.34 MB

code Read specific tags and modify content as shown in the following figure. image image image

twsouthwick commented 2 months ago

Can you provide a runable sample instead of an image? For example, I have no idea what PropertyAccessor<TContext> is.

twsouthwick commented 2 months ago

A couple of notes, though on what I see:

snake-L commented 2 months ago

A couple of notes, though on what I see:

  • [some IEnumerable<>].ToArray().AsSpan() is going to create a lot of GC pressure if this is in a loop. Use the IEnumerable<> - no need to coerce it into a span especially since you're just iterating over it
    • I'm assuming the PropertyAccessor<TContext> is probably doing some reflection - that is an easy place to use too much memory

Hello, after optimizing the usage of AsSpan, the execution speed has improved, but the GC still hasn't changed, as shown in the following figure image Optimized code:

private static IEnumerable<A.Run> GetRunsWithAttributes(OpenXmlElement slide)
 {
     return slide.Descendants().OfType<A.Run>().Where(run => run.HasAttributes);
 }
 private static IEnumerable<A.Run> GetRuns(Slide slide)
 {
     foreach (var run in GetRunsWithAttributes(slide))
     {
         yield return run;
     }
 }

Here is the source code of my Property Accessor

public class PropertyAccessor<T>
    {
      private static readonly ConcurrentDictionary<string, Func<T, string>> GetterCache
          = new ConcurrentDictionary<string, Func<T, string>>();

      public static string GetValue(T target, string propertyName)
      {
          if (target == null)
              throw new ArgumentNullException(nameof(target));
          if (string.IsNullOrEmpty(propertyName))
              throw new ArgumentException("Property name cannot be null or empty", nameof(propertyName));

          var getter = GetterCache.GetOrAdd(propertyName, CreateGetter);
          return getter(target);
      }

      private static Func<T, string> CreateGetter(string propertyName)
      {
          var propertyInfo = typeof(T).GetProperty(propertyName,
          BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Instance);

          if (propertyInfo == null)
              throw new ArgumentException($"Property '{propertyName}' not found on type {typeof(T).Name}", nameof(propertyName));

          var parameter = Expression.Parameter(typeof(T), "x");
          var property = Expression.Property(parameter, propertyInfo);
          var nullCheck = Expression.Equal(property, Expression.Constant(null));
          var conditionalExpression = Expression.Condition(
              nullCheck,
              Expression.Constant(string.Empty),
              Expression.Call(property, typeof(object).GetMethod("ToString"))
          );

          return Expression.Lambda<Func<T, string>>(conditionalExpression, parameter).Compile();
      }
  }
twsouthwick commented 2 months ago

Can you make the repro something I can copy/paste? Not sure which parts of the original post vs the comments I should be testing

snake-L commented 2 months ago

Can you make the repro something I can copy/paste? Not sure which parts of the original post vs the comments I should be testing

Hello, I have created a new example where container injection and some business code have been removed PPTBenchmark.zip Here are the results of the new example pressure test image