dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
18.71k stars 3.98k forks source link

SourceGenerator debugging generated source #49260

Closed MarkPflug closed 3 years ago

MarkPflug commented 3 years ago

I've been working on an approach to unit testing/debugging a source generator within in Visual Studio. My approach uses the Roslyn APIs directly to compile source and invoke the CSharpGeneratorDriver directly inside the unit test project. I then Emit the compilation to in-memory streams (dll and pdb), then dynamically load the assembly/symbols into the unit test process and invoke the entry point method (compiled as console app). Doing this I am able to debug the code that implements my source generator, then also debug the result of the compilation. Unfortunately I'm only able to step through the code files that were statically included in the compilation (what would be the "user code") because I can point directly to the source files when I construct the CSharpSyntaxTree.

Unfortunately, I can't figure out a way to also allow stepping through the code that the SourceGenerator emits. When constructing a SourceText to add to the SourceGeneratorContext there is a canBeEmbedded parameter, which I interpreted to cause the generated source to be included in the pdb so it could be stepped through. Unfortunately, it doesn't seem to work. If I try to step into the generated code, Visual Studio asks me to navigate to the source file, which, of course, doesn't exist anywhere. The name of the source file VS is requesting appears to be generated "[GUID][GeneratorTypeName][hintName].cs". I was hoping I could save the generated code to a temp file with this name, and that the debugger would then be able to discover it. Unfortunately, I don't know where the [GUID] segment is coming from, so I can't even create such a file.

I also tried including the generate source by passing EmbeddedText to the compilation.Emit method. That didn't appear to make any difference, VS still prompted for source files. I'm not sure if I need the EmbeddedText filename to match exactly, which I can't do because I don't know the GUID comes from.

Do you know of a way to make this work? Is the "canBeEmbedded" parameter expected to achieve anything? If I was able to step through the generated code as well as the static code it would make the experience of authoring a source generator pretty seamless.

I'd appreciate any guidance you can offer.

Thanks!

MarkPflug commented 3 years ago

I saw this tweet, just moments ago: https://twitter.com/xoofx/status/1326250754705920001 Which makes me wonder if there is a simpler approach that I'm not aware of.

jasonmalinowski commented 3 years ago

@MarkPflug: first off upgrade to the latest 16.8 (non-Preview) that shipped a few hours ago. This makes one fundamental change that the file name that the debugger will look for will be much shorter -- it's just the hint name -- so if you are trying to write out the files to disk you won't have other issues to worry about there.

Otherwise can you share our code snippet that's calling Emit? @chsienki can help from here.

jasonmalinowski commented 3 years ago

(moving to the compilers since this is actually about how to use the compiler API to correctly get the PDB to emit...)

MarkPflug commented 3 years ago

Already on the latest version as of this morning, 16.8. I'm still being prompted to locate a file that starts with a GUID.

Is there documentation somewhere with guidance on debugging source generators? At first people were suggesting using Debugger.Launch(), which, while it worked, was horribly unpleasant. It was tedious to attach the process each time, and it would require restarting VS every debug run.

jasonmalinowski commented 3 years ago

Which NuGet packages are you referencing to get the compiler API?

MarkPflug commented 3 years ago
    <PackageReference Include="Microsoft.CodeAnalysis.CSharp.Workspaces" Version="3.7.0" PrivateAssets="all" />
    <PackageReference Include="Microsoft.CodeAnalysis.Analyzers" Version="3.3.1" PrivateAssets="all" />
jasonmalinowski commented 3 years ago

You'll want to bump that first one to 3.8 which corresponds to the 16.8 release.

MarkPflug commented 3 years ago

Is there a released version for 3.8? I'm only seeing prerelease 3.8.0-5.final. Do I need to point to a third party nuget feed?

jasonmalinowski commented 3 years ago

Ah, there's supposed to be. @dotnet/roslyn-infrastructure what's the ETA for getting the 3.8 final packages up?

@MarkPflug in the mean time using the prerelease ones would be a good stopgap.

MarkPflug commented 3 years ago

Checking now. Needs a few tweaks to adapt to the new api.

MarkPflug commented 3 years ago

OMG, it works! That is so rad! What a pleasant experience this is going to be.

Thank you so much!

RikkiGibson commented 3 years ago

I think we are working on unblocking release of the final packages in #49265

MarkPflug commented 3 years ago

My celebration was a little premature. It actually doesn't appear to work.

The reason it appeared to work was because there happened to be physical representation of the file in my project. There is a subset of the "generated source" that is just static code that will be compiled into the target project. I handle these files by carrying the source text in embedded resources in the source generator assembly. Since the source file in the project matches exactly the contents that I include, it seems that the VS debugger identifies it as the correct source file. It seems that the filename also needs to match the hint name used by the source generator.

Unfortunately, when I started also including dynamically generated code, I wasn't able to step into the source when debugging.

This code sits in a repository that isn't currently public, but I'll try to make it public or at least get a minimal repro.

In the mean time, if this sounds like something that should "just work", perhaps I'm doing something wrong? Is there complete documentation around source generators yet? I think all I've seen so far are a couple blog posts with trivial examples.

MarkPflug commented 3 years ago

Okay, I now have a real-world repro example in a public repository: https://github.com/MarkPflug/scriban/tree/sourcegen

This is implementing a source generator using Alexandre Mutel's Scriban library to provide similar functionality to T4 templates. It uses AdditionalFiles to pass .stt files (scriban templates) to the source generator, that then uses the Scriban library to do all the code-gen work.

Repro: Clone the repo (sourcegen branch). Open src/scriban.sln Set a breakpoint in Scriban.SourceGenerator.Tests/TestFiles/BasicTemplate/Program.cs at A.Run(), then debug the "BasicTemplate" unit test. The breakpoint should be hit, this file is statically included in the compilation. Try to "Step into" the A method. You should be prompted for the source location for the Template.stt.cs file. I expected to be able to step into the A method, which is dynamically generated by the source generator.

Scriban.SourceGenerator/ScribanSourceGenerator.cs, Line 45:

var source = SourceText.From(ms, Encoding.UTF8, canBeEmbedded: true);

I thought the canBeEmbedded would cause the generated source to be included in the pdb, so that the debugger would have access to it and I wouldn't be prompted. I think I must be misunderstanding the meaning of this argument. Or perhaps the "can" part of it is implying that I need to do more to cause it to be embedded. Being able to step into that method is the missing piece of the puzzle. If I could do that, then developing a source generator would be absolutely delightful.

chsienki commented 3 years ago

Hmm, I believe this scenario should just work, but I also now we had some IDE bugs around it. @jasonmalinowski is this one the bugs you fixed recently?

MarkPflug commented 3 years ago

Another bit of information, that might be related: The BasicTemplate/Program.cs file that I set a breakpoint in; I had to change the file encoding to UTF-8 with BOM to be able to hit the breakpoint. By default the file was created without a BOM, so the VS "save as" dialog was identifying it as windows-1252 (presumably the machine default). The file only contains ASCII characters, so any of ASCII, UTF-8 nobom, and windows-1252 would have the same representation. I had to force it to utf-8 with bom, and then make sure the encoding matched when I created the syntax tree, code here: https://github.com/MarkPflug/scriban/blob/78efb3855ebe883ffa660ce1f8754731641ad83a/src/Scriban.SourceGenerator.Tests/SourceGeneratorTest.cs#L76-L80

That allowed me to step into the the code file that I was statically adding the compilation, outside the source generator.

I tried doing the same thing within the source generator: https://github.com/MarkPflug/scriban/blob/78efb3855ebe883ffa660ce1f8754731641ad83a/src/Scriban.SourceGenerator/ScribanSourceGenerator.cs#L85-L93 , and set the canEmbed option, but I'm still being prompted to find the file.

Two things I need to do: 1) Verify that the BOM bytes are actually showing up in the memory stream buffer. I was pretty sure that StreamWriter did that, but I didn't verify. 2) Verify that I can see the source text in the .pdb. Currently the pdb only exists in memory. I'll report back my findings around those two items. Might get lucky and fix the problem myself.

MarkPflug commented 3 years ago

1) yes, bom bytes are present. 2) Not sure what to expect here...

Should the dynamically generated source be carried in the pdb if I inspect it? I'm just looking at it with a binary viewer, but I don't see anything that looks like source code, only file paths. I see a relative path of the generated source file: Scriban.SourceGenerator\Scriban.SourceGenerator.ScribanSourceGenerator\Template.stt.cs. However, the "hintName" that I'm providing is only the filename part Template.stt.cs, so I don't know where the relative path segments are coming from. The other file, that I can step into, has the full path of the source file: which is where it physically lives on disk, which might be why I'm able to step into it: it physically exists on disk.

MarkPflug commented 3 years ago

Okay, new discovery. Compilation.Emit accepts an embeddedTexts argument, which I wasn't passing. I've modified my code to provide this based on the source that the SourceGenerator emits. I'm still not able to step into the generated code though. I think the body of the embedded text gets compressed, so it is hard to verify if it is actually landing in the pdb.

chsienki commented 3 years ago

I havn't had a chance to look at the repro repo yet, but it sounds like you're running the generator driver yourself and having trouble getting the sources embedded?

MarkPflug commented 3 years ago

Correct. I'm running the generator driver, to add sources to the compilation. Emitting the compilation to in-memory buffers. Assembly.loading those in-memory buffers (asm and symbols). Then invoking the entry point method of the assembly (compiled as a console app). Everything runs inside the the unit test process. It all "works", meaning the test pass. I just want to be able to step through all the bits of code, which seems like it should be possible, I just need to figure out the magic incantation to get the source files in the right place; which I thought was embedded in the pdb.

chsienki commented 3 years ago

Yeah so you need to make sure you add the source as an EmbeddedText, that should ensure it adds it to the PDB, which it sounds like you've done. The PDB format isn't particularly easy to inspect raw, but if you dump it to disk you should be able to use an inspector to see that they're getting added to the document table:

image

I suspect that VS isn't loading the PDB because it's all existing in memory only? I'm not an expert at the VS debugger, but I'm guessing it won't be able to correctly look up the symbols for an in memory assembly. What do you see in the modules view while debugging? You should probably see the module, and if its got symbols or not. If its loading and thinks it has symbols, and those symbols contain the source, then I would expect it to work.

chsienki commented 3 years ago

Oh, also, PDBs with embedded source only work with portable PDBs. I can't remember what the default is, but you might want to make sure you're creating portable PDBs rather than windows ones.

MarkPflug commented 3 years ago

I tried setting the emitOption to portalPdb. No change, still asking for source location. I wonder if I'm running into an encoding problem, which is causing the hash of the embedded source to not match the SourceText that I hand to the compiler. Everything should be ASCII only at this point, so I wouldn't think it should matter, but I'm worried that BOM are being included in one path and not the other and that's what's making things diagree. This is all wild speculation though.

MarkPflug commented 3 years ago

What are you using to inspect the pdb?

chsienki commented 3 years ago

That screenshot is DotPeek, but I think it can only view PDBS when it's embedded in the assembly. FWIW I just got this working in a Roslyn test, so it's definitely possible. I can post the test here if it's useful, although it uses a bunch of internal Roslyn helpers so won't be immediately copy + paste-able.

MarkPflug commented 3 years ago

If I could take a look it might make it clear what I'm doing wrong. I'm sure it is probably some minor tweak that is preventing things from lining up for me. Appreciate the help.

chsienki commented 3 years ago
        [Fact]
        public void Single_File_Is_Added()
        {
            var source = @"
public static class Program { 
    public static void Main(string[] args) { 
        GeneratedClass.DoThing();
    } 
}

";

            var generatorSource = @"
public class GeneratedClass { 
    public static void DoThing(){
        System.Console.WriteLine(""thing"");
    }
}
";

            var parseOptions = TestOptions.Regular;
            Compilation compilation = CreateCompilation(source, options: TestOptions.DebugExe, parseOptions: parseOptions, sourceFileName: "program.cs");

            SingleFileTestGenerator testGenerator = new SingleFileTestGenerator(generatorSource);

            GeneratorDriver driver = CSharpGeneratorDriver.Create(new[] { testGenerator }, parseOptions: parseOptions);
            driver = driver.RunGeneratorsAndUpdateCompilation(compilation, out var outputCompilation, out _);
            outputCompilation.VerifyDiagnostics();

            var generatorTree = driver.GetRunResult().GeneratedTrees[0];
            var programTree = compilation.SyntaxTrees.First();
            EmbeddedText et = EmbeddedText.FromSource(generatorTree.FilePath, generatorTree.GetText());
            EmbeddedText et2 = EmbeddedText.FromSource(programTree.FilePath, programTree.GetText());

            var assemblyStream = new MemoryStream();
            var symbolsStream = new MemoryStream();
            var result = outputCompilation.Emit(assemblyStream, symbolsStream, embeddedTexts: new[] { et, et2 });

            var assemblyBytes = assemblyStream.GetBuffer();
            var symbolsBytes = symbolsStream.GetBuffer();
            var asm = System.Reflection.Assembly.Load(assemblyBytes, symbolsBytes);

            asm.EntryPoint.Invoke(null, new object[] { new string[0] });
        }
chsienki commented 3 years ago

Stepping into asm.EntryPoint.Invoke(... lets me step through the generated code. I'll try and get into a simpler repro.

Also, just a thought I'm on an internal preview of VS, so I'll double check this works in the public version ;)

chsienki commented 3 years ago

Ok, here's a skeleton console app that this works in, and I tested in the latest public preview of VS https://gist.github.com/chsienki/334ec398079988fab97bfd7029cf6736

Let me know if it's still not working on your example, and we can try and dig into the specifics of your tests a bit more.

MarkPflug commented 3 years ago

Chris, you are a golden god!

What I was missing was how to get the generated sources back out of the generator driver via GetRunResult().GeneratedTrees. I was ferrying them back to the compiler through a hacky back channel, which is probably why the source wasn't matching what the debugger wanted to see.

Thank you SOOOO much! This is really rad when it works.