angr / archr

Target-centric program analysis.
BSD 2-Clause "Simplified" License
72 stars 21 forks source link

Unable to generate trace for certain CGC binaries using PoVs: BrokenPipeError #68

Closed dnivra closed 3 years ago

dnivra commented 3 years ago

archr fails to generate the trace for some CGC binaries using their PoVs. The traceback is something like this:

Traceback (most recent call last):
  File "archr-repro.py", line 17, in <module>
    archr_trace_result = archr_tracer_bow.fire(testcase=b''.join(pov.writes))
  File "/home/dnivra/angr-dev/archr/archr/analyzers/__init__.py", line 50, in fire
    r.write(testcase)
  File "/home/dnivra/.virtualenvs/angr-dev/lib/python3.6/site-packages/nclib/netcat.py", line 687, in send
    s = s[self._send(s):]
  File "/home/dnivra/.virtualenvs/angr-dev/lib/python3.6/site-packages/nclib/netcat.py", line 513, in _send
    ret = self.sock.send(data)
  File "/home/dnivra/.virtualenvs/angr-dev/lib/python3.6/site-packages/nclib/simplesock.py", line 221, in send
    return self.child_send.send(data)
  File "/home/dnivra/.virtualenvs/angr-dev/lib/python3.6/site-packages/nclib/simplesock.py", line 160, in send
    return self.file.write(data)
BrokenPipeError: [Errno 32] Broken pipe

The binaries for which I was able to reproduce this error are CROMU_00028(binary, pov), NRFIN_00008(binary, pov), NRFIN_00012(binary, pov), NRFIN_00039(binary, pov) and KPRCA_00034(binary, pov; takes sometime to finish tracing). However, I am able to generate a trace using tracer fine(except for KPRCA_00034) so I am probably not using archr correctly to generate a trace(or it's an actual bug). Here is a small script to reproduce the issue:

import pathlib

import archr
import tracer

cgc_binary_name = "CROMU_00028"
# https://github.com/zardus/cgc-bins/blob/master/all_unpatched/CROMU_00028
binary = f"cgc-bins/all_unpatched/{cgc_binary_name}"
# From https://github.com/lungetech/cgc-challenge-corpus/blob/master/CROMU_00028/pov/POV_00000.xml
pov = tracer.TracerPoV(f"cgc-challenge-corpus/{cgc_binary_name}/pov/POV_00000.xml")
# Uncomment lines below to enable tracing using tracer
# print("Generating trace using tracer...", end="", flush=True)
# tracer_trace_result = tracer.QEMURunner(binary=str(binary), input=b''.join(pov.writes))
# print("done.")
print("Generating trace using archr...", end="", flush=True)
archr_target = archr.targets.LocalTarget([str(binary)], target_os='cgc')
archr_tracer_bow = archr.arsenal.QEMUTracerBow(archr_target)
archr_trace_result = archr_tracer_bow.fire(testcase=b''.join(pov.writes))
print("done.")

Any help/insights into what could be wrong would be helpful!

rhelmot commented 3 years ago

this does appear to be some sort of short-reads issue. Generating the trace with fire(testcase=pov.writes) (a supported input format which sleeps for 0.1sec between sending entries) generates the trace just fine.

dnivra commented 3 years ago

Using archr_trace_result = archr_tracer_bow.fire(testcase=pov.writes) instead of archr_trace_result = archr_tracer_bow.fire(testcase=b''.join(pov.writes)) does not work either. I had originally tried that approach but it failed in many more cases(I can find out which ones if that would be useful) while concatenating them worked in some of those failing cases. That is why I went for the concatenate and pass as testcase approach.

rhelmot commented 3 years ago

so you're probably in a better position than me to debug this, so here's how I would go about it: the PoVs are supposed to be reliable archives of how to interact with the binary to produce a given result. If you are getting a broken pipe error, that means the process is exiting before all the input was sent, which obviously indicates a problem. Fortunately, because the PoV is a full archive of what the interaction is supposed to look like, it can be used to tell when execution goes off the rails by just comparing the output to what the PoV says the output should be.

Off the top of my head, there are a couple of reasons this could be happening:

1) If you're using the concatenated string, the program could be depending on short-reads. I think this is the case for CROMU_00028, the only binary I tested before writing the previous reply. 2) If you're using the chunked writes, the program could be hitting a timeout. I looked at how tracer implements PoV replay and it is pretty much exactly the same, except that the sleep is for 0.01sec instead of 0.1sec. I would be super surprised if the former were the reason things were desyncing. 3) One other difference between archr tracer and tracer tracer is that in tracer tracer the file descriptor passed to the child process is a socketpair instead of a pipe. This could maybe affect things because of how the kernel buffers data? It's a long shot. 4) Since we're not reading the program's stdout, it's possible that buffer could be filling up in the kernel? I feel like if this were the problem it would be manifesting as a hang rather than a crash, but who knows. 5) The PoV is just broken. Not much we can do here, and also not really sure how we could verify this? Maybe replay the PoV under the DARPA-provided PoV replay program and see if it reports errors?

dnivra commented 3 years ago

Thanks a lot for concrete suggestions for next steps! I was looking into few other failures and got a chance to look into these now.

  1. Yeah it does seem from the CROMU_00028's PoV that it does something like read some data, process and repeat. In that case, it should technically work fine(but it doesn't) if we don't concatenate the input into one large string shouldn't it?
  2. I change the sleep duration to 0.01 in archr but it doesn't seem to make any difference just like what you expected.
  3. Yeah it could be a problem. How can we go about checking if it is?
  4. (I will come back to 3 in a bit) I added print(r.read(timeout=self.timeout), flush=True) just before the write so that the output is read and kernel buffer doesn't fill up(as well as get a sanity check of progress of the binary through the PoV). For most binaries(except NRFIN_00039 where I don't see any output), I do see output just like in the PoV so things do seem to be working fine and the binary is processing the PoV correctly.
  5. I don't think the PoV is broken because it works fine when we generate a trace using the same PoV using tracer instead of archr. Anyhow, I found the PoV in lungetech corpus is identical to the PoVs on the official repo(this is official repo I hope?) so that's not an issue. We use trace TracerPoV to parse PoV in both cases so it cannot be parsing issue either(but I'm anyhow trying to verify this with the PoV parser that SymCC folks wrote; it's in Lisp so figuring out how to run it).

Something else I was thinking of(which could be related to 3): should we verify that the test case was read correctly by the binary? Can we actually verify it? We see it's output so we know that's correct but we don't anything about how the input is read. Maybe it is/will be read correctly and it's a non-issue. I am not super sure though so thought I'd raise this concern.

rhelmot commented 3 years ago

Surprisingly, it was kind of 5. https://github.com/angr/tracer/commit/284b67b7d194b7902f6a208838fc47e0a77040da fixes this.