LLNL / ravel

Ravel MPI trace visualization tool
Other
29 stars 10 forks source link

Seg fault while opening trace #9

Open wohlbier opened 6 years ago

wohlbier commented 6 years ago

While attempting to open a trace I get:

Processing /p/work1/wohlbier/devel/aster/TAU/aster/centennial21-aster-trace/0/traces.otf2 Reading definitions Reading events Segmentation fault (core dumped)

The trace files are 2.2GB.

wohlbier@centennial21 aster]$ du -sh TAU/aster/centennial21-aster-trace/0/traces 2.2G TAU/aster/centennial21-aster-trace/0/traces [wohlbier@centennial21 aster]$ ls TAU/aster/centennial21-aster-trace/0/traces 0.def 1792.def 2688.def 3712.def 4608.def 5504.def 640.def 7424.def 0.evt 1792.evt 2688.evt 3712.evt 4608.evt 5504.evt 640.evt 7424.evt 1024.def 1920.def 2816.def 3840.def 4736.def 5632.def 6528.def 7552.def 1024.evt 1920.evt 2816.evt 3840.evt 4736.evt 5632.evt 6528.evt 7552.evt 1152.def 2048.def 2944.def 384.def 4864.def 5760.def 6656.def 7680.def 1152.evt 2048.evt 2944.evt 384.evt 4864.evt 5760.evt 6656.evt 7680.evt 1280.def 2176.def 3072.def 3968.def 4992.def 5888.def 6784.def 768.def 1280.evt 2176.evt 3072.evt 3968.evt 4992.evt 5888.evt 6784.evt 768.evt 128.def 2304.def 3200.def 4096.def 5120.def 6016.def 6912.def 7808.def 128.evt 2304.evt 3200.evt 4096.evt 5120.evt 6016.evt 6912.evt 7808.evt 1408.def 2432.def 3328.def 4224.def 512.def 6144.def 7040.def 7936.def 1408.evt 2432.evt 3328.evt 4224.evt 512.evt 6144.evt 7040.evt 7936.evt 1536.def 2560.def 3456.def 4352.def 5248.def 6272.def 7168.def 8064.def 1536.evt 2560.evt 3456.evt 4352.evt 5248.evt 6272.evt 7168.evt 8064.evt 1664.def 256.def 3584.def 4480.def 5376.def 6400.def 7296.def 896.def 1664.evt 256.evt 3584.evt 4480.evt 5376.evt 6400.evt 7296.evt 896.evt

kisaacs commented 6 years ago

Does it work on smaller traces generated from the same tool? Ravel has not yet been made to work in a distributed fashion. I'm trying to figure out if it is just out of memory in some way or there's a baked in assumption about the format that isn't what TAU produces.

wohlbier commented 6 years ago

I just generated a trace for the stream benchmark using tau running on one openmp thread. Ravel also fails in this case but it gives more output. In order to get ravel to build I had to build qt+opengl. Not sure if that should matter.

wohlbier@centennial17 .tau]$ Ravel xkbcommon: ERROR: failed to add default include path Qt: Failed to create XKB context! Use QT_XKB_CONFIG_ROOT environmental variable to provide an additional search path, add ':' as separator to provide several search paths and/or make sure that XKB configuration data directory contains recent enough contents, to update please see http://cgit.freedesktop.org/xkeyboard-config/ . QWidget::setLayout: Attempting to set QLayout "" on QWidget "traditionalLabelWidget", which already has a layout Processing /p/home/wohlbier/devel/stream/.tau/stream/centennial-stream-trace/0/traces.otf2 Reading definitions Reading events Finish reading 0 unmatched sends and 0 unmatched recvs. OTF Reading: 0.032669 seconds Event/Message Matching: 0.000203139 seconds Merging for messages... Message Merge: 6.45e-06 seconds Partitions = 0 Merging cycles... Cycle Merge: 7.525e-06 seconds Partitions = 0 Merging based on call tree... Caller Merge: 1.1769e-05 seconds Assigning local steps Local Stepping: 1.349e-06 seconds Setting global steps... Merging global steps... New dag... Global Stepping: 3.688e-06 seconds Num partitions 0 Calculating lateness... Lateness Calculation: 1.5616e-05 seconds Gnomifying... Clustering seed: 1988600792 Gnomification/Clustering: 4.0123e-05 seconds Structure Extraction: 0.000117871 seconds Total trace: 0.0334849 seconds QThread: Destroyed while thread is still running Aborted (core dumped)

kisaacs commented 6 years ago

Thanks for checking it out. If you send me the traces, I can try debugging them. It looks like there are separate issues though.

...and just to make sure we're on the same page, it looks like you're running the master branch, is that correct? (Do you want the Ravel-alignment or are you just looking for a standard OTF2 viewer -- the other branch is a simple Gantt chart.)

wohlbier commented 6 years ago

I only really need a standard otf2 trace viewer. I built ravel with spack, which looks like it downloads the v1.0.0 tarball. I can change that to check it out from github if I should.

Attached is the stream trace. If that checks out then I can get you the bigger trace. stream.zip

kisaacs commented 6 years ago

Thanks. I'll take a look -- though, probably not in the next few hours.

I was under the mistaken impression that the spack version had been updated since there have been bug fixes since v1.0.0 -- I was talking to someone else who claimed they had tried the updated version, but maybe they weren't using spack. Perhaps it is time for a v1.0.1.

If you just want a standard trace viewer, the basic-otf2 branch in Github cuts out a lot of the unneeded dependencies so hopefully the install will be easier.

wohlbier commented 6 years ago

Ok, thanks. I'm going to edit the spack script to clone and checkout basic-otf2. jgw

wohlbier commented 6 years ago

The traces load using basic-otf2. The big trace is taking quite a while to load. These are the changes I made to spack:

[wohlbier@centennial17 spack]$ git diff diff --git a/var/spack/repos/builtin/packages/ravel/package.py b/var/spack/repos index 3f03444..f906626 100644 --- a/var/spack/repos/builtin/packages/ravel/package.py +++ b/var/spack/repos/builtin/packages/ravel/package.py @@ -32,6 +32,8 @@ class Ravel(CMakePackage): homepage = "https://github.com/llnl/ravel" url = "https://github.com/llnl/ravel/archive/v1.0.0.tar.gz"

+ version("basic-otf2", git="https://github.com/LLNL/ravel.git", + branch="basic-otf2") version("1.0.0", "b25fece58331c2adfcce76c5036485c2") depends_on("cmake@2.8.9:", type="build")

Then spack install -v ravel@basic-otf2^qt+opengl