IsoFrieze / DiztinGUIsh

A Super NES ROM Disassembler
GNU General Public License v3.0
259 stars 26 forks source link

Request for comment: new data model proposal #46

Closed binary1230 closed 3 years ago

binary1230 commented 3 years ago

I'm looking for comments from folks who are more familiar with the memory mapping of retro systems (specifically the SNES for now, but, with an eye on other stuff like NES, Genesis, whatever else we want to throw at it).

I have been thinking a lot about the data model in Diz and how we could better support the following use cases:

  1. Decoupling of the UI
  2. Reference data from multiple source ROM files, multiple Diz projects, etc
  3. Record some squishier metadata more of the archeology/history of the diasssembly work , who performed it, how certain are they of sections etc
  4. Make it easier to ship data in and out of Diz via plugins, sockets, file formats, etc
  5. Make the Diz project file be a useful reference database
  6. Support multi-user collaboration (pipe dream: turn the backend into a REST API and have a web UI for people to mark up parts of a ROM via webpage, save in cloud, etc)
  7. Support heavy decoupling of the UI from the underlying data format (so we can port other people's tools over, or add arbitrary 'views' of the underlying raw bytes like hex editor vs text assembly output vs grid view for disassembly etc.
  8. Make it possible for other people to include our UIs easily in their projects (like, drop Diz UIs into your emulator core easily)
  9. Multithread safe (so, CPU-heavy operations like capturing realtime trace data are zippy)
  10. Slice and dice your data into multiple regions (i.e. data vs code vs compressed data), nest and collapse them
  11. Ubiquitous data change notifications on all classes (so, all views update when underlying data changes).
  12. Make all this run at reasonable performance
  13. Deal with mirrored memory
  14. Still support Diz's main disassembly workflow (the datagrid main screen) really well as its primary operation

I've been doing work on making the UI heavily decoupled in Diz which is nearing an end, which lays the groundwork for this next phase to begin.

As an exercise, I drew up a pseudocode class diagram of what this might end up looking like. No one needs to carefully read this, I'm more interested if any of this pops up as landmines to anyone. Or if code like this already exists out there we could integrate into here.


// the main thing that gets serialized as a .diz project file.
// Diz should support projects referencing each other, and editing multiple projects at once
Project:
- ByteSources[]      // places we can get bytes from (disk, images, roms, text, or generated as decompressed or processed parts of other already loaded data)
- RootRegion         // arbitrary tree of "regions" which are subsets of specific ByteSources with specific mappings. 
                              // holds per-byte annotations, which mark things like code, data, graphics, tracelog info, and arbitrary metadata
- Builds[]                // how to turn regions into output (like generated assembly, .bin files for graphics, etc)

// ------------
// ByteSource: Immutable data sources.
// ------------

abstract ByteSource:
- Bytes[] Get only

// system-agnostic, just represents a bunch of bytes read from disk somewhere. could be rom, text, images, whatever
ByteSourceFile : ByteSource:
- SourceFilename // examples: romfile.smc romfile_bank_C0.bin graphics_pack.bin dialog.txt file.png
- StartingFileOffset = 0
- ByteCountToReadFromFile = -1

// snes-specific stuff
SNESRomSourceFile : ByteSourceFile:
- skipsmcheader = true
- RomMapping (i.e. hirom, lowrom, etc)
- Speed
- other stuff like that

GenesisRomSourceFile : // ... whatever ... //

// --------------------------------------------------------------------------------------------
// Regions define arbitrary subsets of byte sources, and hold data related to the window offset
// and how to generate their Byte data from arbitrary sequences of bytes
// 
// Regions can overlap, be overlaid on top each other, have priorities/etc.
// i.e. a "patch" can be visualized as a couple regions which are overlaid on the main ROM
//
// some workflow ideas:
// 1. dump WRAM or SPCRAM and save as a .bin file, map it as an example of data in a Region,
// annotate, and export the annotations onto the section of the ROM containing the original code
// that was copied into WRAM/spc/etc.
// 2. dump VRAM data, mark it up
// --------------------------------------------------------------------------------------------

Region : is also a ByteSource
- Mapping                              // options: 1:1, or using compression algorithm
- Collection<RegionOffset, Annotation>

- SubRegions[] // regions whose ByteSource is set to 'this' region

// searches our subregions first, returns anything matching there as our override. 
// if nothing found, use our own mapping.
// good for stuff like patches, where patch modifications are a sub-region we want to override whatever comes from our mapping.
- byte GetByteAt(offset)                    
- Annotations[] GetAnnotationsAt(offset)    // aggregates all annotations associated with this offset from both us and our sub-regions

// this handles mapping in both a SNES sense (like hiRom, lowRom, etc)
// but in also any arbitrary sense
MappingType:
- ByteSource SourceData
- StartingOffset    // "window" into the byte source. i.e. set to 0x10000 and count = 0xFFFF for bank C0
- ByteCount

ArbitraryMapping:
- ByteProviderStartOffset, OutputOffset
- ByteProviderByteCount, OutputOffset

// maps byte offsets into arbitrary address space. this is HiROM, LowROM, ExHIRom, etc
MappingTypeSNES:
- MapType

// how about a byte source that reads compressed data from a region, decompresses it, and shows you the data in any of our viewers 
// (like hex editor, graphics viewer, )
ByteSourceCompressed : ByteSource:
- CompressAlgorithm // i.e. standard (.gz etc) vs some game-specific algorithm
- SourceRegion

// ---------------
// So here's an example of a SNES-specific mapping config
// ---------------

// up until this point, regions aren't mapped into anything address-space specific. here's an example of a SNES rom
// lower levels of the system shouldn't know anything about 'banks' etc
var SnesHiRom = new Mapping {
    Name="HiROM", 
    DestOffset=0xC00000, Count=0x40[#banks] x 0x1000[banksize]
}

var SnesWRAMHiRom = new Mapping {
    Name="WRAM",
    DestOffset=0x7E0000, Count=XX[#banks] x 0x1000[banksize],
    Mirrors = {0x00, ...} // define that this memory is mirrored to other places.
}

var DizProject {
  ByteSources[] = {
    SNESRomSourceFile {"somegame.smc", skipSMCHeader = true}
  }
  Regions[] = {
    { Name = "ROM", ByteSource = ByteSources["somegame.smc"] }
  }
}

class SNES {
    Regions[] = {
        new Region {
            Name = "Main CPU",
            SubRegions[] = { 
              { Name = "Rom", MappingType = SnesHiRom, Source=DizProject.Regions["ROM"] },
              { Name = "WRamCapture-BattleMode", MappingType = SnesWRam, Source=DizProject.Regions["ramdump1"] },
              { Name = "WRamCapture-OverworldMap", MappingType = SnesWRam, Source=DizProject.Regions["ramdump2"] },
              { Name = "CompressedData", Algorithm=Games.NintendoZip2, ..src/dst offsets... }
            }
        },
}

// ---------------
// Annotations: i.e. Attach random metadata to ALL THE THINGS. attaches to offset on a particular region
// goals:
// 1. mark a single byte or a block of bytes with whatever metadata we want
// 2. be able to attach multiple of the same type of annotations to an offset, and pick one as "the real one" or "the example"
//    i.e. for tracelog data, it might be useful to keep all the previous tracelog import data, and mark one as "the real one", the rest are
//    "examples"
// 3. Store all this in a platform-agnostic format i.e. regions/annotations/etc shouldn't have to "know" they are SNES vs Genesis vs etc.
// 4. Keep or collapse as much as you like.
// ---------------

Annotation:
- metadata // optional rando metadata, dunno, like....
  - souce origin (i.e. was this marked by hand, gotten from CPU tracelog, CDL trace, etc)
  - author
  - date changed
  - data reference source // [i.e. https://romhacking.net/{some_page}, etc)
  - certainty // (100%, or not sure, or wrong disassembly, or guess)
  - tags, maybe? // "overworld", "battlesystem", "boss AI system"

AnnotationDataBlock
- StartingRegionOffset
- Count
- Type // (graphics, music, table, etc)

// labels a specific line, literally the "label" on the left hand side of the grid
AnnotationLabel : Annotation
- Text

AnnotationComment : Annotation
- Text

AnnotationFreeSpace : AnnotationDataBlock

// placed here either by hand, or, multiple per-byte if tracelogger finds new combinations
// only one of them is marked as the "real" one
Annotation65XCpuFlags : Annotation
- dataBank
- directPage
- xFlag
- mFlag

Annotation65XInstructionByte : Annotation
Annotation65XOperandByte : Annotation

// raw data from a CDL capture (was this byte read from? written to? code run from here? etc)
AnnotationCDLEntry : Annotation
- byteflags = {unknown, read_from, written_to, executed_from}

// -----------------
// all of the above stuff is just how to STORE data and map it and mark it up.
// it's nothing about how to display, modify, or export the data, which should all be in another layer.
// ------------------

dataGrid.DataSource = new RomByteDataGridRow[1000];

// for displaying stuff on a maingrid like what Diz does now, make a display-specific class like this.
// the datagrid class is generic and will respond to the metadata here for the columns
// and the specific field values are one row

// (this is actually pretty close to what it looks like in the current bleeidng edge GUI refactor)
public class RomByteDataGridRow : INotifyPropertyChanged
{
    private offsetInRegion;
    private region; // arbitrary, might typically be set to SNES.Region["CpuBus"]["ROM"]

    [DisplayName("Label")]
    [Editable(true)]
    [CustomConfig(col =>
    {
        col.DefaultCellStyle = new DataGridViewCellStyle
        {
            Alignment = DataGridViewContentAlignment.MiddleRight, Font = FontHuman,
        };
        col.MaxInputLength = 60;
        col.MinimumWidth = 6;
        col.Width = 200;
    })]
    public string Label
    {
        get => region.GetAnnotation<AnnotationLabel>(offsetInRegion).Name;

        // todo (validate for valid label characters)
        // (note: validation implemented in Furious's branch, integrate here)
        set
        {
            region.GetAnnotation<AnnotationLabel>(offsetInRegion).Name = value;
            OnPropertyChanged();
        }
    }

    // program counter (Read-only)
    [DisplayName("PC")]
    [ReadOnly(true)]
    public string Offset =>
        Util.NumberToBaseString(offsetInRegion, Util.NumberBase.Hexadecimal, 6);

    // ascii version of the byte
    [DisplayName("@")]
    [ReadOnly(true)]
    public char AsciiCharRep =>
        (char) region[offsetInRegion];

    // hex version of the byte
    [DisplayName("#")]
    [ReadOnly(true)]
    public string NumericRep =>
        Util.NumberToBaseString(region[offsetInRegion], Util.NumberBase.Hexadecimal);

    // ....snip, add whatever other properties you want to display....
}

// annotation generation (i.e. what Diz basically does right now as its core operation)
// example: 
// - adding labels
// - disassembly workflow (like CPU Step-through, Step-in, etc)
// - marking blocks of data as graphics, codes, pointer tables, etc

class 65816_CpuOperations {
    void Step(int offset, Region region) {
        // .........
    }
}

// builds - replaces current "Export Assembly"
// define how and when output artifacts (assembly files, .bin files, etc)
// are generated.
// already supported via command line
//
// would be cool if we could keep our management of this very lightweight, and use some existing build utilities.
// like generating Makefiles [or something that doesn't suck to deal with], so it can be run outside Diz.

DizProject = {
    ...
    Builds[] {
        Build1={
            OutputAssemblyCode {"generated/", split_by_bank=true, flavor=CPU65816/SPC700/etc}
            Compilation {"asar.exe [params] main.asm", Output="generatedrom.sfc"}
            Defines {"RomVersion", United States", true}
                        RootRegion=this.RootRegion.SubRegion["SnesCPUBus"]["Rom"]
            Validation {
                MustBeByteIdentical {OriginalImportedRomFilename, "generatedrom.sfc"},
                MatchInternalCheckSum {[some checksum value from the rom]
                                NoPatchOverridesAllowed
            }
        },
               Build2={
                     Inherit=Build1
                     ApplyPatches[patchProject.RootRegion["InfiniteHitPointsPatch1"]
                    OutputDiff={build1.output, this.output, diffWRiteTo="patch.ips"} // something like this
    }
}

// fun bonus ideas:
// with this data structure, might make it easy to have either tighter integration with a Debugger (like BSNES)
// or also, invoke a real emulator on a section of a ROM (i.e. "hey BSNES: run starting at offset X til you reach 
// offset Y, using this RAM or savestate snapshot")
//
// It will also make writing custom tool integrations really simple, for things like graphics/audio/editors
// or integration game-specific tools that already exist.
//
// And, we can create arbitrary window layouts, do things like making other windows "follow along" with you, remember history.
// imagine clicking around on a ROM and when you have a line with a JMP statement, the other window shows you a preview of where you are jumping
//
// have Hex editors, byte grid viewers, assembled output previews, etc available
//
// or, hook this up to be the backend of a microservices API, and build an interactive web viewer for this data.
// imagine being able to query data from games, looking for patterns, etc. create hot-links and share them like we do with
// github issues
binary1230 commented 3 years ago

a version of this is now implemented in #48 . It's pretty close to this initial sketch

binary1230 commented 3 years ago

closing this because it's all basically actually implemented in #48 now