dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.43k stars 4.76k forks source link

Analyzer: Introduce an analyzer to recommend the more efficient string splitting features in .NET 8 #85487

Open geeknoid opened 1 year ago

geeknoid commented 1 year ago

Introduce a performance-centric analyzer that detects uses of String.Split and recommends the new more efficient string splitting functionality from MemoryExtensions. I'm not sure a fixer can be provided here, it might be too complicated for that. But having an analyzer to point out an obvious place that can get a perf boost is still useful.

Category : Performance Severity : Info

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-system-runtime See info in area-owners.md if you want to be subscribed.

Issue Details
Introduce a performance-centric analyzer that detects uses of String.Split and recommends the new more efficient string splitting functionality from MemoryExtensions. I'm not sure a fixer can be provided here, it might be too complicated for that. But having an analyzer to point out an obvious place that can get a perf boost is still useful.
Author: geeknoid
Assignees: -
Labels: `area-System.Runtime`, `untriaged`, `code-analyzer`
Milestone: -
buyaa-n commented 1 year ago

[Triage note]: We shouldn't flag for all string.Split cases, need to look for patterns where the new API could be used. Based on the pattern a fixer will be offered. When https://github.com/dotnet/runtime/issues/934 implemented we could replace more string.Split cases

Category : Performance Severity : Info

Question:

bartonjs commented 1 year ago

Video

Out of a concern of false positives, it seems like the initial version of this analyzer should be limited to those callsites where a count was passed into string.Split, as that pattern has a perfect analogue in the span-based split.

Using uncounted string.Split in a foreach would be better handled by an iterator-based split (which we don't have yet). Other uses of uncounted split might be analyzable, but they're harder to describe, so they should probably be a different diagnostic ID (once the pattern can be analyzed) just to keep the docs/explanation sane.

The suggestion regarding Trim/TrimEnd/TrimStart may have merit, but seems like it might need some more thought (as it depends heavily on what happens after the call to Trim)... and should be addressed in a separate issue, with more details/examples.

Category: Performance Severity: Info