DependableSystemsLab / LLFI

LLFI is an LLVM based fault injection tool, that injects faults into the LLVM IR of the application source code. The faults can be injected into specific program points, and the effect can be easily tracked back to the source code. Please refer to the paper below. NOTE: If you publish a paper using LLFI, please add it to PaperLLFI.bib
http://blogs.ubc.ca/karthik/2014/02/23/quantifying-the-accuracy-of-high-level-fault-injection-techniques/
Other
68 stars 35 forks source link

Profiling and Injection scripts hang on large outputs #67

Closed chipbuster closed 9 years ago

chipbuster commented 9 years ago

I don't know if this is specific to OS X, but when I use the profiling/injection scripts on programs that output a lot of text on stdout, the python script tends to hang.

The following (rather silly) program is one way to trigger this issue on my computer:

include<stdio.h>
#define NUM_ITER 100000

int main(){
        int i,j;
        for (i = 0; i < NUM_ITER; i++){
                for(j = 0; j < 10; j++){
                        printf("aaaaaaaa");
                }
                printf("\n"); //80 chars per line + newline
        }
        printf("I AM FINISHED NOW\n");
        return 0;
}

It simply prints 80 characters per line for 100,000 lines, then exits. When run in the terminal, the profiling executable exits in under 5 seconds. When run in the profiling/injection script, it hangs and never exits.


Some poking around leads me to believe that the reason this is occurring is because the program fills up the PIPE to stdout (I think the warning in that documentation applies to both wait() and poll()).

Since the script doesn't read from the pipe until after the program exits, the whole thing deadlocks: the program is waiting for the stdout buffer to be emptied so that it can continue, and the script waiting for the program to exit so that it can empty the stdout buffer.

I've created a simple fix that involves reading the lines as they come in to stdout. This works, but I don't know what the performance costs are, how it affects the timeout duration, or whether it breaks other programs, since I've only tested it on 2 at the time of writing.

EDIT: This "patch" completely breaks the timeout mechanism.


Replacement section of execute() function for profile.py that does not block
def execute(execlist):
  #print "Begin"
  #inputFile = open(inputfile, "r")
  global outputfile
  print('\t' + ' '.join(execlist))
  #get state of directory
  dirSnapshot()
  p = subprocess.Popen(execlist, stdout=subprocess.PIPE)
  elapsetime = 0
  lines = [] #Append lines to a list in memory instead
  while True:
    elapsetime += 1
    ### NEW CODE ###
    # If the program isn't done, read all of the stdout it's generated since the last
    # sleep() call and store it in lines. Empties the stdout buffer before sleeping again.
    for line in iter(p.stdout.readline, b''): #Grab all non-empty lines in stdout
      lines.append(line.decode().rstrip()) #strip trailing whitespace and store
    ###
    time.sleep(1)
    if p.poll() is not None:
      moveOutput()
      print("\t program finish", p.returncode)
      print("\t time taken", elapsetime,"\n")
      outputFile = open(outputfile, "wb")
      ### NEW CODE  ###
      for line in lines:
        outputFile.write(bytes("%s\n" % line, 'UTF-8')) #write stored lines to file
      ###
      outputFile.close()
      replenishInput() #for cases where program deletes input or alters them each run
      #inputFile.close()
      return p.returncode
karthikp-ubc commented 9 years ago

This is interesting. @chipbuster can you please submit this as a pull request and we'll integrate it ?

Regarding timing being broken, I wonder why that happens ? Do we see false positives or false negatives ? If it's the latter, we rarely see hangs anyways, so perhaps it's not a big deal. Thanks,

chipbuster commented 9 years ago

The issue with timing is that the program will never time out. I think it's because the first for-loop (for line in iter(...)) never stops reading input until the program exits. Basically, elapsetime is never incremented because the program gets stuck in the loop reading input.

Since Python 3.3, the communicate function has an optional timeout parameter, which throws an exception if the process runs longer than the timeout value. If you don't need to support Python 3.2, I have a version which uses the timeout to do the communication and uses time.time() to get program execution times. Would you prefer that?

karthikp-ubc commented 9 years ago

I think that would work. We don't need to support Python versions earlier than 3.3 for the time being. It'd be great if you can submit a pull request with these two changes. Thanks,

karthikp-ubc commented 9 years ago

Thanks for the pull request. @karfair will integrate this into LLFI after testing it, so I'm closing this issue.