Optimize Plugin Cleaning Performance

Plugin cleaning is a bit on the slow and heavy side. It loads the entire ESM / ESP file into memory, taking up approximately one byte of process memory per byte in the file (plus allowances for each top GRUP being placed into its own byte array instance, which is irrelevant). This is fine, and it's actually ideal. What it does next may shock you!

After loading the file, it then creates a "semantic" model of its contents so that individual pieces can be manipulated easily. Want to delete a record because it's labeled as "identical to master"? No problem, just remove it from its parent's collection! The downside of this is that official plugins wind up allocating millions of individual objects on the managed heap to do so, which bloats the process's memory space dramatically and makes it unreasonable to keep the master files resident in the process's memory space for too long. This wouldn't be a terribly big problem if not for the fact that technically, you can play modded Skyrim on a 32-bit installation of Windows, and so 32-bit process memory limitations apply. The upside, of course, is that I was able to actually get something useful out the door.

There are a few ways this process can be improved, and I think they're more reasonable than they were when I made the switch to this "allocations" because I'm no longer trying to cut my way through the weeds:

Limit the parts of a .esm / .esp file that we keep alive to just the parts that must be kept alive to be handled in later files being cleaned.
1. This is harder than it sounds, because some UDRs need to be moved to different GRUPs, so we need to keep enough around to be able to find the path to the correct parent.
2. Still, this is not terribly hard.
Do all processing on the "raw" file, or at least just the "deletes".
1. Very hard, but also has the greatest potential improvement.
2. Deleting a field within a record (ignoring compression) just involves copying all the later data over its data and updating the containers' sizes.
3. Deleting a record is the same, except we also delete its container if it's now empty, and we also fix the TES4's HEDR's numRecords and its ONAMs as appropriate.
4. Deleting a container is basically the same as deleting a record.
5. UDRs are really tricky. When processing a file that's a master for other files, we know the UDRs we're going to be deleting later, so we can just keep the data around for the top GRUPs containing the original data and paths to the root. The tricky part is merging the data into the targets, as it gets really tricky to increase the size of a byte array like we're going to be doing.

airbreather / StepperUpper

Optimize Plugin Cleaning Performance #7