cmu-sei / pharos

Automated static analysis tools for binary programs
Other
1.53k stars 186 forks source link

remove 0x prefix #140

Open Trass3r opened 3 years ago

Trass3r commented 3 years ago

It doesn't really add any value and doesn't fit the naming convention in IDA for example. Json entries like

"name": "mbr_0x4",
"offset": "0x4",

could just be represented as

"name": "mbr",
"offset": "4",

I guess and the actual naming is left to the exporter script.

sei-eschwartz commented 3 years ago

What would you want a member at offset 10 to be named? mbr_a?

Trass3r commented 3 years ago

Yes.

sei-eschwartz commented 3 years ago

This seems reasonable to me. The main disadvantage is that it changes the JSON format. But it probably should have been this way all along.

Trass3r commented 3 years ago

Actually it was like this (without 0x) before already at some point. At least the exporter needs to be able to easily follow tool-specific naming conventions like these.

sei-eschwartz commented 3 years ago

Thanks for sharing that link. I agree that we should reuse IDA's naming conventions when importing into IDA

Trass3r commented 3 years ago

Maybe it can be left in place after all as a reference, or just removed. Looks like the importer should always construct the names manually anyway using the information provided by the other fields. I wouldn't want to see such a name in code (either something short like parent1 or nothing if the tool supports proper inheritance):

https://github.com/cmu-sei/pharos/blob/1606c719fd766dd5bc3784254dfeef5681bdce00/tools/ooanalyzer/tests/ooex_vs2010/Lite/oo.json#L7-L14

Trass3r commented 3 years ago

Similarly is there a particular reason to include all those base members? Is it for sax-style json parsing? Edit: oh ok they are actually missing from the base class members descriptions. Is that intended?

sei-eschwartz commented 3 years ago

I think the corner case that influenced this was to show uses from the derived class of a member defined in the base class. @sei-ccohen Is that what you remember?

Trass3r commented 3 years ago

Yes I get that, but those members are lacking in the base class' members list. Fixing that on the importer side proved to be tricky. It was also a bit misleading that base refs can also refer to a sub-structure rather than a parent class. And I saw one bogus ref:

"0x40": {
    "base": true,
    "name": "mbr_0x40",
    "offset": "0x40",
    "parent": false,
    "size": 4,
    "struc": "",
    "type": "",
    "usages": [
        "0x5d99ee"
    ]
},
sei-ccohen commented 3 years ago

@sei-ccohen Is that what you remember?

Yes, there was an issue of some sort that motivated us to just list all the members. My recollection was that we still weren't completely happy with that decision though. Just that it was expedient and that we have other priorities as well... Regardless, @Trass3r, there's definitely an issue in the Hexrays plugin where we need to tell IDA about each access. It's not quite as magically smart about type propagation as Ghidra is.

Yes I get that, but those members are lacking in the base class' members list.

Hmm. Perhaps here's the real problem. Maybe we need an analysis pass during results generation that correctly concludes the existence of member fields in the base class, based on the derived class accessing them. I'm not sure that we do this currently (even though we obviously should). The current algorithm I think only looks at the accesses in the methods attached to the base class for the base clas members... :-(