andywilsonuk / StringTokenFormatter

Provides string extension methods for replacing tokens within strings (using the format '{name}') with their specified lookup value.
Apache License 2.0
37 stars 1 forks source link

ObjectPropertiesTokenValueContainer is greedy #11

Closed TonyValenti closed 5 years ago

TonyValenti commented 5 years ago

Hi @andywilsonuk , I've been troubleshooting a performance issue and I noticed that ObjectPropertiesTokenValueContainer's constructor immediately calls ConvertObjectToDictionary which results in every property value being retrieved on the source object.

I would like to propose that this be moved into the "TryMap" function such that:

If I were to write a PR that accomplished that, would it be accepted?

andywilsonuk commented 5 years ago

Hi,

Yes it makes sense that the values are received only when needed and I'm happy for you to create a pull request for it. If you're concerned about performance it might be worth caching the values (using Lazy as the dictionary value?) so that the reflection part is only done once per property.

Andy

TonyValenti commented 5 years ago

@andywilsonuk I've just created PR #12 which contains the discussed enhancements. I performance tested a number of implementations and the one selected is the one that has the most benefit.

TonyValenti commented 5 years ago

@andywilsonuk I just added more code into the PR. After digging into performance some more, there were a small handful of things that resulted in big gains:

  1. The default token matcher's regex was not compiled. Changing this to a compiled regex and reusing it on multiple calls to SplitSegments results in a 10% speedup.

  2. The ITokenMatcher.SplitSegments method which parses the string is an expensive function. In our use case, we use StringTokenFormatter in a tight loop, calling it over and over with the same format, just with different arguments. I introduced a new type, SegmentedString, which contains the results of SplitSegments. By using StringTokenFormatter.SegmentedString.Create(...) we can now cache the results of SplitSegments and reuse them over and over. This results in a 50% speedup.

andywilsonuk commented 5 years ago

Hi @TonyValenti, looks like you've been busy! I want to pick into this but I don't have access to my dev machine at the moment.

The lazy loading of the property values look good as does the compiled regex; I like the idea of the performance tests. SegmentedString looks like a big change and it's certainly a breaking change so I want to chew it over when I can actually open the solution!

Could I ask that you update the PR with the curly brackets consistent with the original? for one it keeps the code consistent with elsewhere and for two it makes the diff related to code changes rather than stylistic changes.

There are a few other changes I want to make to this library over the next few weeks, how urgently do you need the changes integrating in? do you have a workaround until the rest of the updates are ready?

TonyValenti commented 5 years ago

@andywilsonuk - Yes! It has been a busy weekend! This is high priority but we are able to wait for now. We would really like to have a publish in the next 3 weeks though if at all possible.

RE: Curly Braces - Done! I think I got them all.

RE: SegmentedString - The implementation route I chose actually made it not that big of a change although it did require modifying the interface. The "Biggest" change was that DefaultTokenMatcher's SplitSegments went from:

public IEnumerable<IMatchingSegment> SplitSegments(string input)
{
   //Original Implementation
}

to:

public SegmentedString SplitSegments(string Input)
{
  return new SegmentedString(SplitSegmentsInternal(Input));
}

private IEnumerable<IMatchingSegment> SplitSegmentsInternal(string input)
{
  //Original Implementation
}

Perhaps with these changes and the ones you want to wrap in, is time for a v2.5 or 3.0?

Also, one other thing I'll mention: When I was writing the code, I didn't want to change a bunch of your tests from:

TokenReplacer.FormatX(string Input...)

to:

TokenReplacer.FormatX(SegmentedString Input...)

because I felt like that would be a more sizable change and would likely be rejected. However, I would point out that with a little bit of refactoring, the string variants can be removed from the code (they all call into the SegmentedString variants I introduced anyways).

Also, something else that just jumped out at me. I notice that in a handful of places we call things like:

new FormatProviderValueFormatter(possiblynull)

or:

new TokenReplacer()....

If you're going to be doing more refactoring, you might consider introducing a static: TokenReplacer.Default FormatProviderValueFormatter.Default

As that would save on the allocations (important in tight-loop scenarios like ours)

TonyValenti commented 5 years ago

@andywilsonuk Any eta on a release? We're getting close to being ready for our release but we don't want to release until we're able to take advantage of these performance enhancements.

andywilsonuk commented 5 years ago

I think v3 should be ready by middle of next week. I've started making some commits into the v3 branch and I'll merge your changes into there tomorrow. After that there are a few other changes I want to make and then add some more tests.

TonyValenti commented 5 years ago

Excellent! Middle of next week is perfect!

Tony Valenti

Schedule a meeting with me https://www.fasterlaw.com/book-online/tony-valenti

On Thu, Feb 28, 2019 at 3:09 PM Andy Wilson notifications@github.com wrote:

I think v3 should be ready by middle of next week. I've started making some commits into the v3 branch and I'll merge your changes into there tomorrow. After that there are a few other changes I want to make and then add some more tests.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/andywilsonuk/StringTokenFormatter/issues/11#issuecomment-468439265, or mute the thread https://github.com/notifications/unsubscribe-auth/AM-qVh00TyM2tkPcp6Av37jjr3GVsRrAks5vSEWDgaJpZM4a9nDT .

TonyValenti commented 5 years ago

Never mind! Just had a chance to review the code. Can't wait for a nuget release!

andywilsonuk commented 5 years ago

Hi Tony,

You'd be looking at something like this:

string input = "string with {token} in";
var tokenReplacer = new TokenReplacer();
var tokenMatcher = new DefaultTokenMatcher();
var segments = tokenMatcher.SplitSegments(input);

object o = new { Token = "mapped" }; // object containing properties to use as mappings
var container = new ObjectPropertiesTokenValueContainer(o, tokenMatcher);
string output = tokenReplacer.FormatFromContainer(segments, container);

Whereby the last two lines you would repeat for each object (o in my example) you want to map. Also in v3 you have the CascadingTokenValueContainer where you can pass multiple containers in a single hit and it will iterate through them until it finds a matching mapping. You won't get a performance improvement for that but it might help you write cleaner code.

v3 should be available on nuget soon!

TonyValenti commented 5 years ago

Hi @andywilsonuk , In my original commit, I had an extensions class that essentially added the same "FormatToken" extensions onto SegmentedString that regular strings have. This would allow me to accomplish the 7 lines above with the following:

var SubjectTemplate = StringTokenFormatter.SegmentedString.Create(DataModel.Message_Subject);
...
var Subject = SubjectTemplate.FormatToken(args);

I feel like that is a lot more concise than the above and makes it a lot easier to do the right thing. Any chance you'd consider adding it back in?

andywilsonuk commented 5 years ago

I didn't want SegmentedString to be a driving class, I felt that it formed part of the data which TokenReplacer uses. For your project you could early create an extension method to simply the usage.

DefaultTokenMatcher has an overhead so I'd either use a long life instance or use TokenReplacer.DefaultMatcher.

If you are using args to with multiple replacements then you'll get a performance boast from reusing the ObjectPropertiesTokenValueContainer because of the Lazy.