arizvisa / ida-minsc

A plugin based on IDAPython for a functional DWIM interface. Current development against most recent IDA is in the "persistence-refactor" branch, ancient (but stable) work is in "master", so... create an issue if you want/need something backported. Use "Wiki" or "Discussions" for examples, and smash that "Star" button if you like this.
BSD 3-Clause "New" or "Revised" License
316 stars 53 forks source link

Feature: database.stash and exposing a filesystem-like api in an IDA database #8

Open arizvisa opened 6 years ago

arizvisa commented 6 years ago

This was experimented with during "toolbag" development like forever ago, but never made it into ida-minsc due to the vastly different intentions and mantras between both plugins. In essence, a filesystem-like api exposed via something like database.stash would be very useful for allowing a user to bundle arbitrarily-sized data with their IDA database. This would facilitate higher-level plugin development which needs to cache extra data in the database without storing it externally via the platform's regular filesystem.

The "blob" type for a netnode in an IDA database has some size limitations that need to be abstracted around in order to provide this capability. As a result, to support arbitrarily sized data, we could use a linked list or a tree for searching a file's different chunks. A better way would be to actually implement an embedded filesystem using "blob" types. Actually, we can probably implement something similar to a FAT-based filesystem (or some another filesystem) using blobs as its primitive storage mechanism.

The lower-level components could then be implemented as internal.netnode.nodefs, at which point some higher-level interface could be exposed via the database module. If we have some kind of filesystem like this, then we could begin to consider arbitrarily sized tags that have zero size limitations. We could also modify the semantics of tagging a bit so that anything that's double-underscored (Python name mangling) would result in a hidden tag that is physically stored in the filesystem. This would allow serialization of types other than the basic Python ones that we presently encode within comments.

arizvisa commented 5 years ago

Some related links on ideas for data structures that can be used for this. I'm just closing out some old tabs, so I'll leave these here so they aren't lost.

arizvisa commented 4 years ago

It looks like we can maybe rip https://github.com/williballenthin/ida-netnode/blob/master/netnode/netnode.py... I should probably ask him if it's okay someday.

arizvisa commented 4 years ago

PR #70 was created to perform the research needed to implement this feature.

arizvisa commented 3 years ago

Note to self: make sure the filesystem is versioned in some way. Since the first iteration is probably going to be an allocation table, we might want to upgrade this to a balanced tree at some point because defragmentation sucks. This way we'll be able to introduce breaking changes in the future to the underlying structure and can provide the option of either using the older implementation, or provide a way of converting the older data structure to the new one.