dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.06k stars 4.69k forks source link

Consider propagating regex comments to source-generated code #69616

Open stephentoub opened 2 years ago

stephentoub commented 2 years ago

If a developer specifies RegexOptions.IgnorePatternWhitespace or uses the (?x) inline option, they can embed # comments inside their regex pattern. Today the parser simply throws these away. When in the context of the source generator, we could instead store them and try to propagate them to the generated C# code as C# comments at an appropriate location in the source.

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions See info in area-owners.md if you want to be subscribed.

Issue Details
If a developer specifies `RegexOptions.IgnorePatternWhitespace` or uses the `(?x)` inline option, they can embed `# comments` inside their regex pattern. Today the parser simply throws these away. When in the context of the source generator, we could instead store them and try to propagate them to the generated C# code at an appropriate location.
Author: stephentoub
Assignees: -
Labels: `area-System.Text.RegularExpressions`
Milestone: Future
GSPP commented 2 years ago

It is amazing to see all this regex work happening. I can see the amount of sophistication that goes into building this new engine. This is going to be state of the art. 👍

joperezr commented 2 years ago

I suppose the parser would only create the comment nodes when being called from the source generator? I assume we wouldn't want to create these extra nodes (even when people enable the option for ignoring whitespaces) for any of the other engines.

stephentoub commented 2 years ago

I suppose the parser would only create the comment nodes when being called from the source generator?

That was my thinking.

The hard part here I think is figuring out to which node the comments actually apply. We also have a lot of assumptions in the tree about how many children each kind of node can have, and we probably don't want to disrupt that for this, so we'd likely need some side-channel.