RUB-SysSec / aurora

Usenix Security 2021 - AURORA: Statistical Crash Analysis for Automated Root Cause Explanation
GNU Affero General Public License v3.0
149 stars 20 forks source link

Question on the root cause of the testcases in the paper #10

Closed xwh16 closed 1 year ago

xwh16 commented 2 years ago

Hi, thanks for sharing your excellent code on Aurora.

Could you please add a list giving the ground truth (i.e. root cause) of the vulnerabilities you tested? The root cause of some testcases in the paper are complicated to determine, so I would like to know how you determine which instruction (among the top-50) is the actual root cause. For example, the use-after-free vulnerability in NASM(Table 1, #23) seems to have multiple root causes (its patch has patched multiple locations), which one do you consider as its root cause?

Looking forward to hearing back from you at your earliest convenience. Regards

mu00d8 commented 1 year ago

hi,

sorry for the late response, I've seen the email notification pop up during my vacation but forgot about it. Unfortunately, I don't think we can help you out here. We have manually inspected each bug (which definitely is a quite some work) and only preserved the results. Iirc, the NASM bug was not fixed during writing so one of the others manually dug into it and came up with a reasonable patch location. Note that Aurora's predicates can only cover a single location, which may not be enough in some circumstances.

xwh16 commented 1 year ago

Hi, thanks for your reply!

Another thing I'm looking for your advice is how should I determine the root cause of a vulnerability with an existing patch. Did you consider the patch location (or the lines around it) as the root cause, or manually inspected the bug to determine its root cause instead (which may not be related to the patch)?

I'm asking this because the root cause is indeed hard to inspect, and I am worried that there are misinterpretations in my analysis of Aurora.

tiedaoxiaotubie commented 1 year ago

您的邮件我已经收到了,我会尽快处理然后给你回复。谢谢!

mu00d8 commented 1 year ago

Off the top of my head, Aurora's evaluation used the developers' patches as baseline, so we could measure at how many predicates would a dev have looked before they found the one where they would insert their patch.