Closed orenyomtov closed 5 years ago
One of the reasons we split up reading a file in file_contains_str
by line is because when we do a f.read()
, the input can be really large. By doing file_contents = f.read()
in your PR, we are putting the contents of f
all into one variable in memory, which was an issue in the past with 32-bit versions of Python.
@jschwartzentruber, what do you think? I have the feeling we are entirely on 64-bit Python now fwiw.
FYI a couple lines below, the next function, file_contains_regex
, which is called when the --regex
flag is passed, uses f.read()
def file_contains_regex(file_, regex, verbose=True):
# pylint: disable=missing-param-doc,missing-return-doc,missing-return-type-doc,missing-type-doc
"""e.g. python -m lithium crashesat --timeout=30
--regex '^#0\\s*0x.* in\\s*.*(?:\\n|\\r\\n?)#1\\s*' ./js --fuzzing-safe --no-threads --ion-eager testcase.js
Note that putting "^" and "$" together is unlikely to work."""
matched_str = ""
found = False
with open(file_, "rb") as f:
found_regex = regex.search(f.read())
if found_regex:
matched_str = found_regex.group()
if verbose and matched_str != b"":
print("[Found string in: '" + matched_str.decode("utf-8", errors="replace") + "']", end=" ")
found = True
return found, matched_str
I would though opt to change the default verbosity value (verbose=True
) to false
as to not print the entire file.
For my original scripting purposes, I have worked around this issue by regex escaping the expected crash output, and then passing the verbose flag.
@orenyomtov Thanks! I changed the verbose print to be shorter.
Currently, multiline output search is only supported in regex mode