Closed hazama-yuinyan closed 4 years ago
@jnm2 I think there is no way to manipulate Reflection.Emit
to emit the debug directory, so I choose post-processing for now. If there is any problem, then I'll consider taking other paths.
So could you tell me how to emit the debug directory with SR.Metadata
? Which class will I use?
I don't know much about it besides mimicking Roslyn (which doesn't use Reflection.Emit)
You said this, so even you don't know how to do that? Then I'll try to figure out...
I think I've successfully rewritten the compiler and the executable now has the debug directory section. But when executing the resulting program, it throws an exception saying "BadImageFormatException: Index not found". I think that it results from the target platform being x86 and executing the program on x64 machines, but I don't understand why it's happening because I don't change the target platform from the one that AssemblyBuilder
emitted.
How do you think can I fix it?
OK, I got over this wall, but have gotten into another one. I can't somehow get a parameter handle correctly set. I think I did what the spec says, I mean, setting the first parameter handle if the method declares parameters and setting the next method's first parameter handle otherwise. Maybe I'm misinterpreting setting the next method's first parameter? What does it mean in the first place? Isn't setting the default handle if the next method doesn't have parameters enough?
Sorry I've been tied up with other stuff.
I can't find the document, but IIRC, each row in the method table points to a range in the parameters table. If a method has parameters, the method row points to the first parameter in the parameters table. If it does not have parameters, it points to where its parameters would have started, but since it has none, it is in fact pointing to the parameter of the next method that has parameters.
It makes sense because in order to find out how many parameters a method has, you look at the next method row and see which ending row it's pointing at. It still needs to be pointing at the correct place or else it's not possible to determine how many parameters the previous method has (without potentially walking through the entire rest of the method table looking for a method that actually has parameters).
Sorry I've been tied up with other stuff.
Never mind! I also can do other stuff.
If it does not have parameters, it points to where its parameters would have started, but since it has none, it is in fact pointing to the parameter of the next method that has parameters.
I still don't get it. What if the rest of the methods all don't have parameters? In other words, let's say we have the following methods.
method A: no parameters
method B: has parameters
method C: no parameters
method D: no parameters
What should the row of the method C point at? I thought it should point at what method B does, but it didn't work. I'm confused.
@hazama-yuinyan I think it should point past the end of the list then.
Using this pseudocode to show the general rule for decoding the table:
methodParamCount[i] = methods[i + 1].StartParameterIndex - methods[i].StartParameterIndex;
Therefore:
methods[i + 1].StartParameterIndex = methods[i].StartParameterIndex + methodParamCount[i];
Method table: // Rule: (next row's StartParameterIndex) - (this row's StartParameterIndex) = (this row's parameter count) M0: Name=A, StartParameterIndex=0 // 0 - 0 = 0, therefore M0 has 0 parameters M1: Name=B, StartParameterIndex=0 // 1 - 0 = 1, therefore M1 has 1 parameter M2: Name=C, StartParameterIndex=1 // 1 - 1 = 0, therefore M2 has 0 parameters M3: Name=D, StartParameterIndex=1 // (parameter table count) - 1 = 0, therefore M3 has 0 parameters
Parameter table: P1: Parameter for method B
Or think of maintaining a nextParameterIndex
pointer which starts at 0. More pseudocode:
var nextParameterIndex = 0;
foreach (var method in methods)
{
AddRowToMethodsTable(name: method.Name, startParameterIndex: nextParameterIndex);
foreach (var parameter in method.Parameters)
{
AddRowToParametersTable(parameter);
nextParameterIndex++;
}
}
Hmm, I tried that method, but mdv.exe recognizes that the last method has the parameter, not method B. I tried getting the first parameter handle each time I hit a method definition and it failed too. I'm completely lost...
My understanding may be wrong. Here's the XML docs:
/// <param name="parameterList">
/// If the method declares parameters in Params table the handle of the first one, otherwise the handle of the first parameter declared by the next method definition.
/// If no parameters are declared in the module, <see cref="MetadataTokens.ParameterHandle(int)"/>(1).
/// </param>
Does that help at all?
I'm not finding any documentation on this table. https://www.ecma-international.org/publications/files/ECMA-ST/ECMA-335.pdf is huge but maybe contains what we need.
@tmat You can probably spot what's wrong off the top of your head?
Does that help at all?
Unfortunately, no :( I read that and understood that way. I'll take a look at the PDF.
@hazama-yuinyan Have you played with the indexes a bit to see if you can move around that last parameter?
For example, are the indexes 0- or 1-based? The sequence 0, 0, 1, 1
doesn't work, but what about 1, 1, 2, 2
? or other permutations of 0, 1, and 2? You have to hit it eventually.
OK, I'll try that!
Gotcha! I made it! But got into another problem again. This time mdv.exe says "<bad metadata>" on the first method. But other methods seem fine. What's wrong with the first method?
What's the sequence you came up with?
What's mdv.exe and what does it mean when mdv.exe says "" on a method?
You can see the IDs using MDV tool on both the DLL and the PDB: https://dotnet.myget.org/feed/metadata-tools/package/nuget/mdv
This is mdv.exe. It's a viewer of the metadata. And GitHub accidentally stripped off what it says.
Oh, got it. So what was the parameter list pointer value for each methods in that scenario?
Actually the previous example is taken from the real program, so it should be 1, 1, 2, 2, 2, 2
(there are 6 methods in real).
What happens if you generate only the first method by itself and use 1
? What about the first two methods and 1, 1
? Still the bad metadata error?
Well, I eliminated all the parameters because I can't eliminate all the necessary methods but it still says <bad metadata> and I took a look at mdv.exe's source code and realized that a BadImageFormatException
causes it.
The exception says "Not index found" but I think it would result from not having the same number of MethodDebugInformations as that of MethodDefinitions. I've already emitted enough rows for the MethodDebugInformation table but I still get the same error. Why is it happening? Still missing something?
I have no idea. I've never used mdv.exe. I guess it's possible it has a bug; either way, debugging the source code of mdv might be illuminating.
I investigated mdv.exe and found out that in mdv.exe the exception says "Invalid relative virtual address" on ".ctor". I'm setting them as the reader reads so I'm wondering why it says so, but I will try to modify it. Do you know why?
Looking at https://github.com/dotnet/metadata-tools/blob/master/src/mdv/Mdv.cs, RelativeVirtualAddress
comes up twice. It looks like 0 means the method definition has no method body, and any other number is a relative virtual address of the method body (IL). https://www.ecma-international.org/publications/files/ECMA-ST/ECMA-335.pdf has a lot of info on this.
Oh, I guess I needed to write what method throws the exception. It's generation.PEReaderOpt.GetMethodBody
at line 316 in mdv.cs. And as I edited the previous post, I'm setting the RVAs as the reader reads(thus they will be set as AssemblyBuilder
emitted) but this exception is happening. Because the assembly AssemblyBuilder
emitted will be executed without any problems I wonder why it's not working if it's created by ManagedPEBuilder
.
I guess I'm not setting the IL stream correctly. I mean, it's not set at the relative address. Can't you find how to set it correctly somewhere?
I'm sorry, I don't know. What's the minimal Ref.Emit code you can come up with that produces a binary with which mdv.exe has this issue?
Er, I got over this problem(sorry for not reporting it) but got into another one again and again.
Actually, it's not the assembly which was emitted with Ref.Emit
that is the problem but the assembly which was emitted with SR.Metadata
. So I assume I'm just missing some steps to produce an assembly when I emit the assembly again with SR.Metadata
.
For the new problem, I'm investigating it. mdv.exe this time says "<bad metadata>" in the IL stream. And the BadImageFormatException
says "Invalid method header: 0xAB 0x04".
Ah, can you tell why?
Fyi with Markdown you have three options to keep it from interpreting <
as an HTML tag.
Idiomatic markdown:
\<bad metadata>
→ \
Code (when appropriate):
`<bad metadata>`
→ <bad metadata>
HTML:
<bad metadata>
→ <bad metadata>
Ah, can you tell why?
I don't have experience here either. I'm probably not going to be much help at this point unless there's repro code I can poke at.
Fyi with Markdown you have three options to keep it from interpreting < as an HTML tag.
OK, I forgot to escape it.
I'm probably not going to be much help at this point unless there's repro code I can poke at.
Then, I can push my repository to GitHub with the current code, but it's a little bit complex to clone it and get it to work. Would you mind that? If no, then I'll consider that.
That might help, though I was thinking more along the lines of: how small can you strip down the code and still reproduce the problem?
Let me see...
Only my PEBuilder
wrapper class might reproduce the problem but that still needs some other classes.
So I thought it would be easier for you to clone the whole repository and test it than for me and you to figure out which classes are needed and include it in the reproduction code.
Well sure, that's easier for you 😆 but I've always seen it to be useful for one's own sake to create a reproduction with as little code as possible, too. Using a bare PEBuilder if possible. Half the time you find the issue while doing so.
Oops, it's rather a driver class not a wrapper. The class drives the ManagedPEBuilder
, MetadataBuilder
and similar classes and emit a PE with the debug directory section.
OK, I'll figure out which classes are needed to reproduce the problem ;-)
I tackled and found out that nested classes appear twice in the resulting assembly(one is nested and the other lives in the global namespace) and this xml comment.
/// <remarks>
/// Entries must be added in the same order as the corresponding nested type definitions.
/// </remarks>
So I tried to emit nested classes first but it failed because of the relative virtual address. Because mdv.exe doesn't say "\
Plus, I noticed that the constructor of the nested class is trimmed. Its method body is empty after it rebuilds the assembly.
Same for another global class. The constructor becomes empty after it rebuilds it. Do you need the source code? Then I can copy and paste the (possibly) minimal reproduction code.
OK, I've resolved the constructor problem. And because it's not happening in the real compiler, I could ignore the nested class problem, too. Sorry to disturb you.
You are not disturbing me, don't worry. I'm replying when I have time and knowledge.
Sorry to disturb you again. Although the assembly works now, ASCII strings become empty after it rebuilds the assembly and it segfaults because of UTF-8 strings. After debugging it, I found that correct strings and empty strings are added as user strings. Everything else seems to work fine. Do you have any suggestions?
You are not disturbing me, don't worry. I'm replying when I have time and knowledge.
Thanks!
Did Reflection.Emit generate the UTF-8 strings? I didn't realize they were considered legal by the CLR.
I guess I made a mistake. It's encoded in UTF-16, though I didn't make sure I was right.
I found out that the offsets for user strings are somehow moved and therefore the correspondence between ids and the real strings will be broken and that user strings seem to be automatically added. I don't understand how that happens.
Maybe I was wrong. User strings aren't automatically added. Should I inspect the method bodies and add the user strings?
I don't know. Are you translating method bodies or just treating them as blobs?
Just treating them as blobs. I found Roslyn does that, so maybe I should too.
I think the method bodies contain handles to table rows, so they need to be translated unless you can guarantee that the row indexes in those tables don't change.
OK, because I can't guarantee that(it somehow changes the row ids), I have to translate them.
I've implemented the Roslyn solution but Windows refuses to execute the resulting assembly with BadImageFormatException
saying "Index not found". I found that trying to load a 64 bit dll in a x86 file or vice versa causes it so I tried to switch the target platform to x86 or x64 but it failed either.
Do I have to configure AssemblyBuilder
to target a specific platform? How can I do that?
mdv.exe is now satisfied. I have lost the direction. What should I do? On Mac and Mono, it runs without any problems. How can I satisfy you, Mr. Windows...
I tried with my minimal reproduction code, and Windows denied the access.
@hazama-yuinyan How are you constructing your PEHeader
?
Hi! I'm developing a programming language of my own using the expression tree API and want to debug programs written in Expresso(which is the name of the programming language) in VS Code. To do that, I need to first generate PDB/MDB files and I found and heard that one can generate Portable PDB files by using APIs from the
System.Reflection.Metadata
namespace. But I just can't find any documentations on or about it, nor source codes that apparently use them. So, is here the right place to ask it? If so, would you mind if I ask how to use the APIs?