Add multiline support in file_contains_str

orenyomtov commented 5 years ago

Currently, multiline output search is only supported in regex mode

nth10sd commented 5 years ago

One of the reasons we split up reading a file in file_contains_str by line is because when we do a f.read(), the input can be really large. By doing file_contents = f.read() in your PR, we are putting the contents of f all into one variable in memory, which was an issue in the past with 32-bit versions of Python.

@jschwartzentruber, what do you think? I have the feeling we are entirely on 64-bit Python now fwiw.

orenyomtov commented 5 years ago

FYI a couple lines below, the next function, file_contains_regex, which is called when the --regex flag is passed, uses f.read()

def file_contains_regex(file_, regex, verbose=True):
    # pylint: disable=missing-param-doc,missing-return-doc,missing-return-type-doc,missing-type-doc
    """e.g. python -m lithium crashesat --timeout=30
     --regex '^#0\\s*0x.* in\\s*.*(?:\\n|\\r\\n?)#1\\s*' ./js --fuzzing-safe --no-threads --ion-eager testcase.js
     Note that putting "^" and "$" together is unlikely to work."""

    matched_str = ""
    found = False
    with open(file_, "rb") as f:
        found_regex = regex.search(f.read())
        if found_regex:
            matched_str = found_regex.group()
            if verbose and matched_str != b"":
                print("[Found string in: '" + matched_str.decode("utf-8", errors="replace") + "']", end=" ")
            found = True
    return found, matched_str

I would though opt to change the default verbosity value (verbose=True) to false as to not print the entire file.

For my original scripting purposes, I have worked around this issue by regex escaping the expected crash output, and then passing the verbose flag.

jschwartzentruber commented 5 years ago

@orenyomtov Thanks! I changed the verbose print to be shorter.

MozillaSecurity / lithium

Add multiline support in file_contains_str #71