NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
50.42k stars 5.76k forks source link

PI (personal information) sanitization option for export (Ghidra Zipfiles) #6716

Closed subreption-research closed 1 month ago

subreption-research commented 1 month ago

Currently whenever the current program is exported as a Ghidra Zipfile, certain information like path details and other tidbits are included without possibility of disabling their inclusion.

image

It would be helpful to provide an option in the dialog to sanitize these details, since most users are also likely unaware of the fact that the original path to the object files is included. This will often include usernames and/or local path information that might be sensitive.

The PI that would be beneficial to sanitize includes the following fields:

image

subreption-research commented 1 month ago

@ryanmkurtz This seems to have slipped through.

ryanmkurtz commented 1 month ago

Our team just discussed this and came to the conclusion that the only way to guarantee true sanitization with the Ghidra Zip File exporter is to use Ghidra on a sanitized machine/VM (i.e., generic sounding username). You can go to Edit -> Options for '<program>' -> Program Information and sanitize the fields of interest there, but because the GZF exporter is exporting the entire database, it will not be guaranteed that a forensic examination of these fields will not recover the original values. But, if you are just looking for a superficial "sanitization", then you could use this technique.

It was also recommended that you use the SARIF exporter, sanitize your strings in the resulting text file, and then share that or reimport it and export again as GZF.

subreption-research commented 1 month ago

Understood, this sounds reasonable.

Generally, generalized setups are already in place for all Ghidra-related tasks, so there is not much user-specific PI beyond paths. We agree though that there is extensive potential for other PI to be present elsewhere.

it will not be guaranteed that a forensic examination of these fields will not recover the original values.

Is this related to the way the database is internally handled? (ex. similar to the "slack space" created in SQLite databases when deletions or updates may not actually overwrite or erase previous data, and pretty much in a similar way to every storage backend involving database engines -haphazard that all of them are exposed, either directly at their raw storage level or through the OS itself and the rest of layers involved, including underlying filesystem and storage controllers-).

Prior to filing this issue we did very little research into the internals related, admittedly.

The time spent by the team on this issue is much appreciated. We will consider if there is time and merit to developing a patch, if it has any chances of being merged.

ryanmkurtz commented 1 month ago

Is this related to the way the database is internally handled? (ex. similar to the "slack space" created in SQLite databases when deletions or updates may not actually overwrite or erase previous data

Exactly.

subreption-research commented 1 month ago

No surprises there. Depending on the actual engine used this might be also technically impossible without rewriting it. If the writes are deterministic before or after a commit operation (you can readily assume that the same write operation to the same field or fields will affect the same underlying storage block), then it might be possible to add a modifier to perform single pass erasure on write (at a cost whenever the writes take place, again no research done on how this can impact Ghidra performance).

Might revisit this at a later stage.