gibbed / Gibbed.CrystalDynamics

Tools & code for use with Crystal Dynamics developed games.
zlib License
14 stars 2 forks source link

Question about .drm's internal structure #1

Open KillerBeer01 opened 3 years ago

KillerBeer01 commented 3 years ago

Hi,

I'd like to know more about layouts of data stored in .drm files from DXHR (more specifically, in DTPData sections). I managed to adapt your code to produce more meaningful (as I hoped) dumps of their content, but still can't figure out how the data works. I suppose the key here lies in Resolvers tables that each section has, but when I try to analyze them... it just doesn't make any sense to me. I can't see any data type or length markers, any consistent scheme in how various sections refer to each other, nor any "starting point" from which a chain to data I need could be traced. The only rock solid fact I figured is that every PointerOffset address refers to a EBBEEBBE byte sequence, but that's that. There are data segments not addressed by any DataOffset, there are DataOffset addresses pointing beyond the edge of MemoryStream data field, there are EBBEEBBE's not addressed by any PointerAddress or addressed by a DataOffset, and lots of other stuff I can't make sense of. Whatever shadow of a pattern I may seem to establish looking at data in several .drm's is being promptly debunked by a different .drm where the same pattern fails despite all expectations.

I'm specifically interested in dialogue-oriented information, data fields containing audio file names and line numbers in locals.bin file. I believe that DTPData sections with flag 0x54 are hubs for sections that contain the real data, and in those .drm's I explored I could "visually" trace the connectivity... just not strictly enough to build a parser on it.

If you have any clues (or know somebody who does) about ways for meaningful data to be extracted from .drm's , I'd be extremely grateful.

Thank you!

dtpdata localsbindata con02haasdump con02haasgraph

gibbed commented 3 years ago

Sections in DRMs are flat arbitrary structures loaded into memory, where pointers are enough space for actual pointers that get resolved (overwritten) when the section is loaded at runtime. Which is why the bytes are EB BE EB BE by default (in some cases). There's a table of resolvers which either points to a local section in the local (current) file, or a remote section in another file.

Unfortunately the DRM format doesn't really have any indication of what type of structure any given section is. The game knows this based on what it's loading.

KillerBeer01 commented 3 years ago

> pointers are enough space for actual pointers that get resolved (overwritten) when the section is loaded at runtime You mean that the same data segments addressed by DataOffsets are each resolved into runtime memory as PointerOffsets (so that these pointers' existence does not provide any additional info at all, analyze-wise), or do those pointers link to data segments from other resolvers/sections?

>The game knows this based on what it's loading. This much I figured. But while this logic holds for simple structures like email_database.drm, for something complex like dialogues it's not enough to know what to load, but also where from, and that information must be stored somewhere. It would be logical to store it in .drm's themselves and not in the program code, and that's why I'm trying to look for "cornerstones" from which paths to necessary info could be navigated, possibly using custom rules once they are understood.

Has there been any at all insights on fields marked "Unknown" since the release of your code, BTW?

Thanks again for your work. I imagine that analyzing it all from scratch must have been hell of an effort.

gibbed commented 3 years ago

Consider something like this:

struct foo
{
  int bar;
  baz* qux;
  int quux;
  quux* corge;
};

This would be stored flat in the .drm as a segment, the pointers would have nonsense values (EB BE EB BE) in the actual data, and there would be a list of each pointer offset in the struct, and how to resolve them. So there would be offsets 4 and 12 into this data, plus a local or remote resolver information.

00 00 00 00 EB BE EB BE 00 00 00 00 EB BE EB BE

There's some comments in the code that reads the resolvers.

Local resolver: https://github.com/gibbed/Gibbed.CrystalDynamics/blob/6b2dd22bf448723223c7d99dc5ba69846d9dd88c/projects/Gibbed.DeusEx3.FileFormats/DRM/Resolver.cs#L99-L101

Remote resolver: https://github.com/gibbed/Gibbed.CrystalDynamics/blob/6b2dd22bf448723223c7d99dc5ba69846d9dd88c/projects/Gibbed.DeusEx3.FileFormats/DRM/Resolver.cs#L123-L126

The resolver types in my code extract the necessary parts of the resolver bitflags, so that's handled for you.