cmu-sei / pharos

Automated static analysis tools for binary programs
Other
1.57k stars 192 forks source link

Generating JSON fails with duplicate key. #274

Closed Raphtaliyah closed 3 weeks ago

Raphtaliyah commented 4 weeks ago

I've run OOAnalyzer on a large program (9MB) using the multi step guide and it fails at the last step when it's generating the JSON results:

ERROR: Duplicate key: '0xa8b408'
ERROR: In:
ERROR:   [20] with_output_to(<stream>(0x5596878b5520),exportJSON)
ERROR:   [19] setup_call_catcher_cleanup(user:open('Result.json',write,<stream>(0x5596878b5520)),user:with_output_to(<stream>(0x5596878b5520),exportJSON),_142444,user:close(<stream>(0x5596878b5520))) at /usr/local/lib/swipl/boot/init.pl:663
ERROR:   [16] catch(user:exportJSONTo('Result.json'),error(duplicate_key('0xa8b408'),context(...,_142542)),user:(...,...)) at /usr/local/lib/swipl/boot/init.pl:562
ERROR:   [15] catch_with_backtrace('<garbage_collected>','<garbage_collected>','<garbage_collected>') at /usr/local/lib/swipl/boot/init.pl:629
ERROR: 
ERROR: Note: some frames are missing due to last-call optimization.
ERROR: Re-run your program in debug mode (:- debug.) to get more detail.

running in debug mode: (I assume I had to add :- debug. to ooprolog? I'm very unfamiliar with these tools.)

ERROR: Duplicate key: '0xa8b408'
ERROR: In:
ERROR:   [20] with_output_to(<stream>(0x55a3e77b4590),exportJSON)
ERROR:   [19] setup_call_catcher_cleanup(user:open('Result.json',write,<stream>(0x55a3e77b4590)),user:with_output_to(<stream>(0x55a3e77b4590),exportJSON),_464,user:close(<stream>(0x55a3e77b4590))) at /usr/local/lib/swipl/boot/init.pl:663
ERROR:   [18] setup_call_cleanup(user:open('Result.json',write,<stream>(0x55a3e77b4590)),user:with_output_to(<stream>(0x55a3e77b4590),exportJSON),user:close(<stream>(0x55a3e77b4590))) at /usr/local/lib/swipl/boot/init.pl:666
ERROR:   [17] exportJSONTo('Result.json') at /usr/local/share/pharos/prolog/oorules/oojson.pl:403
ERROR:   [16] catch(user:exportJSONTo('Result.json'),error(duplicate_key('0xa8b408'),context(...,_662)),user:(...,...)) at /usr/local/lib/swipl/boot/init.pl:562
ERROR:   [15] catch_with_backtrace(user:exportJSONTo('Result.json'),error(duplicate_key('0xa8b408'),context(...,_734)),user:(...,...)) at /usr/local/lib/swipl/boot/init.pl:629

If I grep the results file for the duplicate key (0xa8b408) I get:

finalVFTable(0xa8b408, 0x4, 0x4, 0xa8b404, '.?AU?$error_info_injector@Vbad_lexical_cast@boost@@@exception_detail@boost@@').
finalVFTableEntry(0xa8b408, 0, 0x42d0f0).
finalClass(0xa8b408, 0xa8b408, 0x28, 0x28, 0x42d0f0, [0x42cd80, 0x42ce60, 0x42d0f0]).
finalEmbeddedObject(0xa8b408, 0, 0x9e28cc, likely).
finalInheritance(0xa8b408, 0xa882b0, 0x14, 0xa8b408, false).
finalInheritance(0xa8b408, 0xa8ae80, 0, 0xa8b408, false).
finalInheritance(0xa8b410, 0xa8b408, 0, 0xa8b410, false).
finalMemberAccess(0xa8b408, 0, 0x4, [0x42cda9, 0x42cdf1, 0x42ce69, 0x42ce70, 0x42ce7e]).
finalMemberAccess(0xa8b408, 0xc, 0x4, [0x42cdb2]).
finalMemberAccess(0xa8b408, 0x10, 0x4, [0x42cdb8]).
finalMemberAccess(0xa8b408, 0x14, 0x4, [0x42cdc3, 0x42cdf7]).
finalMemberAccess(0xa8b408, 0x18, 0x4, [0x42cdcf]).
finalMemberAccess(0xa8b408, 0x1c, 0x4, [0x42cdde]).
finalMemberAccess(0xa8b408, 0x20, 0x4, [0x42cde8]).
finalMemberAccess(0xa8b408, 0x24, 0x4, [0x42cdee]).
finalDemangledName(0xa8b408, '.?AU?$error_info_injector@Vbad_lexical_cast@boost@@@exception_detail@boost@@', 'boost::exception_detail::error_info_injector<class boost::bad_lexical_cast>', '').

I have the files from the previous steps and can send them in private if needed (the log file is 8.1GB tho and the first 3 steps took 48 hours to run and needed 155GB of RAM).

sei-eschwartz commented 4 weeks ago

Can you share your results file? If so, that is the easiest way for us to see what is going on. You can email to eschwartz@cert.org if posting publicly is a concern.

It's a little odd that 0xa8b408 supposedly inherits from 0xa8ae80 at 0, and embeds 0x9e28cc at 0. But it's not impossible if 0xa8ae80 is a zero size class. And since 0xa8b408 is some boost exception class, that seems likely.

Raphtaliyah commented 4 weeks ago

My concern was the name of the executable making this issue show up in search results but it should hopefully not this way! results.zip

sei-eschwartz commented 3 weeks ago

I was able to reproduce this (unsurprisingly).

sei-eschwartz commented 3 weeks ago

The error is actually coming when generating the VFTables portion of the JSON for class 0xa8b408. Specifically, we identify the VFTable as occurring in two places in that class:

KVPairs: [0xa8b408:vftable{ea:0xa8b408,entries:entries{0:vftentry{demangled_name:,ea:0x42d0f0,import:false,name:virt_dtor_0x42d0f0,offset:0,type:dtor}},length:1,vftptr:0x14},0xa8b408:vftable{ea:0xa8b408,entries:entries{0:vftentry{demangled_name:,ea:0x42d0f0,import:false,name:virt_dtor_0x42d0f0,offset:0,type:dtor}},length:1,vftptr:0x0}]

Something doesn't seem quite right here. If class 0xa8b408 owns its vftable, how could one of its bases use it? It's usually the other way around. So my guess is that we have some sort of rule problem that is making an impossible situation that we didn't detect, and is causing this problem in the JSON exporter.

sei-eschwartz commented 3 weeks ago

This is the meaning of finalInheritance: finalInheritance(DerivedClassID, BaseClassID, ObjectOffset, VFTable, Virtual)

Now look at:

finalInheritance(0xa8b408, 0xa882b0, 0x14, 0xa8b408, false).
finalInheritance(0xa8b408, 0xa8ae80, 0, 0xa8b408, false).

So ultimately the duplication there in the VFTable field is the cause of the error.

As a workaround, you could set those to 0 for now.

But I'd like to find out why that condition is happening. @Raphtaliyah could you email me the original executable?

Raphtaliyah commented 3 weeks ago

I sent an email with the executable! (zipped and as a bin file.)

I tried the workaround and I ended up with a different duplicate key: 0xa8b984. Searching the file for 0xa8b984 shows the same thing happen:

finalInheritance(0xa8b984, 0xa882b0, 0xc, 0xa8b984, false).
finalInheritance(0xa8b984, 0xa8b96c, 0, 0xa8b984, false).

again with a similar boost class (.?AU?$error_info_injector@Vbad_weak_ptr@boost@@@exception_detail@boost@@')).

After making the same changes (setting VFTable to 0) it does complete!

sei-eschwartz commented 3 weeks ago

Thanks. I haven't received it yet. I know my work email uses grey-listing, so hopefully it's just delayed because of that.

sei-eschwartz commented 3 weeks ago

Can you try again? Maybe cc edmcman@cmu.edu too.

Raphtaliyah commented 3 weeks ago

I tried to and this time it got returned because "This message was blocked because its content presents a 552-5.7.0 security issue" so I ended up uploading it to google drive and just sent it as link! (I assume it was the same issue last time but I didn't get an email that it failed to send?)

sei-eschwartz commented 3 weeks ago

Thanks, I received it

sei-eschwartz commented 3 weeks ago

Can you let me know what commands/arguments you used? It probably won't matter, but if you have them handy I'd prefer to run the same commands that you did.

Raphtaliyah commented 3 weeks ago

I don't have the exact commands but these should be close enough:

partition --serialize=ooprog.ser --maximum-memory=256000 (executable name)
ooanalyzer --serialize=ooprog.ser --maximum-memory=256000 --prolog-facts=ooprog-facts.pl --per-function-maximum-memory=256000 --per-function-timeout=600000 (executable name)
ooprolog --facts ooprog-facts.pl --results ooprog-results.pl --log-level=6 >ooprog.log
ooprolog --results ooprog-results.pl --json ooprog.json

The limits (time/mem) might have been different, I just used a "big enough" value. (It does end up eating 155 GB ram and it does hit the default per function memory and time limit!) This error happened on every run I tried: FSEM[ERROR]: Attempt to change function stack delta without changing confidence for function: 0x00895D30 old=+000C(Guess) new=+0008(Guess) but other than this it should complete without any. I ran it in the pre built docker image.

sei-eschwartz commented 3 weeks ago

image

So what is vftable1? A class that inherits from boost::exception?

sei-eschwartz commented 3 weeks ago

Still running ooanalyzer... at least we're in the Prolog phase now.

sei-eschwartz commented 3 weeks ago

Still running...

Raphtaliyah commented 3 weeks ago

It took around 48 hours on a x2iedn.2xlarge on aws.

sei-eschwartz commented 3 weeks ago

Yeah this machine is getting kind of old :-)


From: Stella @.***> Sent: Friday, November 8, 2024 3:02 PM To: cmu-sei/pharos Cc: Edward J Schwartz; Assign Subject: Re: [cmu-sei/pharos] Generating JSON fails with duplicate key. (Issue #274)

Warning: External Sender - do not click links or open attachments unless you recognize the sender and know the content is safe.

It took around 48 hours on a x2iedn.2xlarge on aws.

— Reply to this email directly, view it on GitHubhttps://github.com/cmu-sei/pharos/issues/274#issuecomment-2465658500, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AL6ZAVAT7NX7A6GC5KGXU7DZ7UKFRAVCNFSM6AAAAABRANMRE6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRVGY2TQNJQGA. You are receiving this because you were assigned.Message ID: @.***>

sei-eschwartz commented 3 weeks ago

It's finally done! Let's take a quick look...

So to refresh myself, I want to know why 0xa8b408 is a VFTable installed into positions 0 and 0x14 of class 0xa8b408.

Here is the definition of finalInheritance:

finalInheritance(DerivedClassID, BaseClassID, ObjectOffset, VFTableOrNull, false) :-
    factDerivedClass(DerivedClass, BaseClass, ObjectOffset),

    % The following line stops us from reporting inheritances from A to C if there is also a
    % relation from A to B and B to C.  We have this because virtual inheritance is known to
    % introduce "false" A to C relationships depending on the order that guessing occurs in.
    once((not((factDerivedClass(DerivedClass, OtherClass, _Off1), factDerivedClass(OtherClass, BaseClass, _Off2))),

          classIdentifier(DerivedClass, DerivedClassID),
          classIdentifier(BaseClass, BaseClassID),

          % Try to identify the relevant VFTable
          ((find(VFTable, DerivedClass),
            factVFTableWrite(_Insn, DerivedConstructor, ObjectOffset, VFTable),
            find(DerivedConstructor, DerivedClass))
           ->
               VFTableOrNull = VFTable
           ;
           VFTableOrNull is 0))).

So, for inheritance at offset O, we look for a derived constructor that installs a VFTable of 0xa8b408 into O. Let's see if we can identify which constructors these are now.

[eschwartz@pd4 leagueoflegends]$ cat lol.exe.results.log | fgrep VFTableWrite | fgrep 0xa8b408
Concluding factVFTableWrite(0x42cf17, 0x42cf00, 0x14, 0xa8b408).
Concluding factVFTableWrite(0x42cdf7, 0x42cd80, 0x14, 0xa8b408).
Concluding factVFTableWrite(0x42d040, 0x42d000, 0x14, 0xa8b408).
Concluding factVFTableWrite(0x42ce69, 0x42d0f0, 0, 0xa8b408).
Concluding factVFTableWrite(0x42cec6, 0x42cec0, 0x14, 0xa8b408).

0x42d0f0 is a "this adjustment" that adjusts by 0x-14: image

0x42ce60 does install 0xa8b408 into 0x14, which after the adjustment is 0, but it is immediately overwritten: image

I wonder if this bug is as simple as adding not(factVFTableOverwrite(...)) into finalInheritance. @sei-ccohen ?

sei-eschwartz commented 3 weeks ago

No, that's not it. We are trying to find the derived VFTable, and 0xa8b408 is the derived vftable.

After a little more thinking, the thisptr adjustment is to blame. The thisptr adjustment means that although the method implementation is on 0xa8b408's class, it overrides a destructor method defined at whatever class is at offset 0x14. As such, method 0x42d0f0 expects to be called with a pointer that points to the class at offset 0x14.

I have a branch around somewhere, where I tried to handle thisptr offsets more intelligently. But the problem is that it actually harmed accuracy for simpler programs. It is a tough problem. I might be misremembering, but I don't remember thisptr adjustments being an explicit thunk as in 0x42d0f0. This makes it a lot easier to detect.

Playing around a little here: https://godbolt.org/z/qs55rzaj6

Maybe I am misremembering... image

For now, I'm going to close this issue and make a new one to handle thisptr adjustments better.

sei-eschwartz commented 3 weeks ago

Ah, this is actually a dupe of #187!