Rewrite PDBs? - Githubissues

ohz10 commented 1 year ago

Hi, I'm new to the PDB file format. I found your project as well as Microsoft's. I found your project interesting b/c it actually compiles. I am poking around the example code now.

I'm looking for some advice about rewriting PDB files. I posted a question on Stackoverflow, hoping to get some advice from someone with more experience. Do you think it's possible to read a PDB file and write a new PDB with the source file paths changed? If so, how would you recommend going about this?

Any advice you can offer would be great, thanks.

MolecularMatters commented 1 year ago

RawPDB is intended to be a consumer of PDB files, but not a producer, so I would not recommend using RawPDB for this task. If you want to rewrite PDBs, your best bet might be to use the PDB support found in LLVM, since their lld-link is able to produce conforming PDB files.

However, there are still two drawbacks with this approach:

LLVM does not understand every little part that makes up a PDB file. If your source PDB file uses anything that LLVM does not understand, it will probably not be able to reproduce this data in the destination PDB.
LLVM is a large dependency.

I read your question on Stackoverflow and my recommendation would be to fix the underlying issue (why does editing the source using the junction path corrupt the file?) and not try to fix the symptoms by patching the paths in the PDB.

ohz10 commented 1 year ago

Thank you for your input, I greatly appreciate it.

I read your question on Stackoverflow and my recommendation would be to fix the underlying issue (why does editing the source using the junction path corrupt the file?) and not try to fix the symptoms by patching the paths in the PDB.

The reason files are corrupted is because Perforce is the revision control system. It gets very unhappy if you edit files before checking them out, and it can cause corruption. Perforce doesn't understand junctions.

I 100% agree with the sentiment of fixing the actual problem, but that's not my decision in this case.

MolecularMatters commented 1 year ago

Thanks for clearing that up. I'm familiar with P4 and use it myself, but didn't know that it doesn't support junctions. Thinking about this some more, I wonder if you could change anything in your (distributed) build system to make this work out of the box, without having to mess with paths.

E.g. if you look at FASTBuild, it supports distributed builds and produces PDBs that will contain paths to local files - what matters is how everything is linked together, producing the final PDB. How is this currently done on your end? How does your distributed build system roughly work?

ohz10 commented 1 year ago

My understanding of the issue is that when debugging in Visual Studio, if you edit a file using the junction path, the Perforce plug-in doesn’t recognize it as a path under Perforce control and doesn’t automatically checkout the file, corruption ensues.

Most distributed build systems recreate a user’s local directory structure one way or another and they don’t share build artifacts between users. So if you build a file and I need to build the file, it will get recompiled for me - and vice verse. That’s about all I can say about that.

On Wed, Feb 22, 2023 at 4:40 AM MolecularMatters @.***> wrote:

Thanks for clearing that up. I'm familiar with P4 and use it myself, but didn't know that it doesn't support junctions. Thinking about this some more, I wonder if you could change anything in your (distributed) build system to make this work out of the box, without having to mess with paths.

E.g. if you look at FASTBuild, it supports distributed builds and produces PDBs that will contain paths to local files - what matters is how everything is linked together, producing the final PDB. How is this currently done on your end? How does your distributed build system roughly work?

— Reply to this email directly, view it on GitHub https://github.com/MolecularMatters/raw_pdb/issues/42#issuecomment-1439795283, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB5EZC2TOQNIRFXWUVGQETWYXUKBANCNFSM6AAAAAAVCHSSYA . You are receiving this because you authored the thread.Message ID: @.***>

MolecularMatters commented 1 year ago

Regarding distributed build systems, Incredibuild definitely does caching and sharing of build artefacts. Maybe SN-DBS does that too, but I'd have to check myself.

Back to the topic, I had another idea: Instead of rewriting the (potentially huge) PDB after it has been produced by the linker, why not try changing the paths before the PDB is built?

More specifically, you could look into building everything distributed as usual, but with the /Z7 compiler option in case you don't already do that. Before linking, you could then "massage" the paths which are stored in the debug sections of the .obj files to be linked. Might be easier to do it this way instead of rewriting the PDB, since you can probably leave the rest of the debug information in the .obj alone.

ohz10 commented 1 year ago

The Incredibuild FAQ suggests they reproduce the user’s directory structure on remote build nodes, so it’s unclear to me how they’d share PDBs in that scenario (though I could see them still sharing OBJs). That said, my understanding is Incredibuild doesn’t use cl to create OBJ files or PDBs any more, they have their own toolchain, and thus have more flexibility.

I couldn’t find much information on SN-DBS, but the one architectural diagram I found looked very similar to Incredibuild.

Changing the PDB source file paths on the build side is an option we’ve discussed, however, rewriting PDBs is currently the favored approach.

On Thu, Feb 23, 2023 at 3:32 AM MolecularMatters @.***> wrote:

Regarding distributed build systems, Incredibuild definitely does caching and sharing of build artefacts. Maybe SN-DBS does that too, but I'd have to check myself.

Back to the topic, I had another idea: Instead of rewriting the (potentially huge) PDB after it has been produced by the linker, why not try changing the paths before the PDB is built?

More specifically, you could look into building everything distributed as usual, but with the /Z7 compiler option https://learn.microsoft.com/en-us/cpp/build/reference/z7-zi-zi-debug-information-format?view=msvc-170 in case you don't already do that. Before linking, you could then "massage" the paths which are stored in the debug sections of the .obj files to be linked. Might be easier to do it this way instead of rewriting the PDB, since you can probably leave the rest of the debug information in the .OBJ alone.

— Reply to this email directly, view it on GitHub https://github.com/MolecularMatters/raw_pdb/issues/42#issuecomment-1441443848, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB5EZGIOJB2LO7RNZ43US3WY4VCRANCNFSM6AAAAAAVCHSSYA . You are receiving this because you authored the thread.Message ID: @.***>

MolecularMatters commented 1 year ago

The Incredibuild FAQ suggests they reproduce the user’s directory structure on remote build nodes, so it’s unclear to me how they’d share PDBs in that scenario (though I could see them still sharing OBJs).

Yes, they do so through virtualization AFAIK. Though I don't understand why that should be at odds with sharing build artefacts? Caching and building are two orthogonal things IMO.

That said, my understanding is Incredibuild doesn’t use cl to create OBJ files or PDBs any more, they have their own toolchain, and thus have more flexibility.

I don't think so. They might be using MSBuild underneath and trick it into being better at parallel builds (which isn't that hard), but I doubt they have their own toolchain(s) for Windows, Xbox, PlayStation, etc. Pretty sure they use the platform's native toolchain for compiling & linking.

ohz10 commented 1 year ago

Though I don't understand why that should be at odds with sharing build artefacts?

Other than source file paths being incorrect in PDBs? No.

Pretty sure they use the platform's native toolchain for compiling & linking.

I may have misinterpreted what they said in their FAQ about why OBJ and EXE files look different than those from native builds.

MolecularMatters commented 1 year ago

Here's an idea you could try: RawPDB works with memory-mapped files directly, and is able to read the compiland info of all translation units, which stores the source paths, object paths, etc.

If you can guarantee that your junction PDBs are always going to be equal or longer than actual local paths (e.g. by making a junction to D:\SomeReallyLongDevPathHereOrSomethingLikeThis), you could grab the const char* from RawPDB and simply overwrite them with whatever you want.

This way, you don't have to touch the rest of the PDB and can directly patch the PDB without having to read all the other unrelated data.

On 23 February 2023 19:08:29 CET, oz_10 @.***> wrote:

Though I don't understand why that should be at odds with sharing build artefacts?

Other than source file paths being incorrect in PDBs? No.

Pretty sure they use the platform's native toolchain for compiling & linking.

I may have misinterpreted what they said in their FAQ about why OBJ and EXE files look different than those from native builds.

-- Reply to this email directly or view it on GitHub: https://github.com/MolecularMatters/raw_pdb/issues/42#issuecomment-1442215244 You are receiving this because you commented.

Message ID: @.***> -- Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.

ohz10 commented 1 year ago

RawPDB works with memory-mapped files directly, and is able to read the compiland info of all translation units, which stores the source paths, object paths, etc.

If you can guarantee that your junction PDBs are always going to be equal or longer than actual local paths (e.g. by making a junction to D:\SomeReallyLongDevPathHereOrSomethingLikeThis), you could grab the const char* from RawPDB and simply overwrite them with whatever you want.

Actually, this was my plan for the first cut =) This will probably work for us as a stop-gap solution, but ultimately, we won't be able to guarantee the junction paths are longer.

ohz10 commented 1 year ago

I think we covered everything.

MolecularMatters / raw_pdb

Rewrite PDBs? #42