ARM-software / trappy

This repository has moved to https://gitlab.arm.com/tooling/trappy
Apache License 2.0
60 stars 39 forks source link

base: Add warning when we're at risk of running out of memory #224

Closed bjackman closed 7 years ago

bjackman commented 7 years ago

Pandas doesn't seem to properly handle running out of memory, and segfaults; this gives you no useful error message.

This error probably isn't too rare, because the default Lisa config only provisions 1Gb of memory in the Vagrantfile. That means a 500k line ftrace file can cause a segfault.

I've added a hacky warning in base.py. Do you think this is appropriate? If not, we could just put this in the documentation and hope people spot the connection, and perhaps something like this hacky warning could go in Lisa instead?

JaviMerino commented 7 years ago

500K is not a huge trace. I guess that any time that I have parsed big traces I have done it from outside of lisa/vagrant. I am not against merging this, but we should consider adding more memory to vagrant if it is going to complain so easily.

bjackman commented 7 years ago

Yep, I suggested we add more memory to the Lisa Vagrantfile, @derkling was more in favour of documenting the issue so that people can add more memory themselves as needed. If I can find a reliable way of detecting this cause of failure (so that users see "you need more memory" instead of just "Segmentation fault (core dumped)", I guess that would be fine.

bjackman commented 7 years ago

Pushed a new version, slightly less hacky :)

bjackman commented 7 years ago

Removed the logging import.

I tested it by constructing an FTrace object with a particular trace file on VMs with varying memory sizes. As I mentioned in the commit message I haven't tested on systems without /proc/meminfo but I did try changing the "/proc/meminfo" string in _get_free_memory_kb to garbage, and nothing untoward happened.

I also wasn't able to trigger a case where the warning gets printed but the memory error doesn't occur; the closest I could get is using a 67Mb trace file on a system with 1118164 bytes memory; the warning gets printed, Pandas doesn't segfault, but instead we seem to get OOM killed.

JaviMerino commented 7 years ago

Thanks! :heart: