Reproducer that fails to create swap.exec

wohlbier commented 3 years ago

I have given you access to a reproducer which fails to create swap.exec. To reproduce:

git clone git@github.com:wohlbier/spmm.git
cd spmm
git co dash
# edit arch/clang.sh to match your environment
mkdir build
cd build
../arch/clang.sh
make
ctest -V

The step before creating swap.exec

[2021-07-21 09:44:56.237] [info] Successfully swapped entrance 0 of kernel: K6
[2021-07-21 09:44:56.237] [info] Successfully swapped entrance 0 of kernel: K7
[2021-07-21 09:44:56.237] [info] Successfully swapped entrance 0 of kernel: K8
[2021-07-21 09:44:56.237] [info] Successfully swapped entrance 0 of kernel: K9
[2021-07-21 09:44:56.245] [critical] Tik Module Corrupted:

And the ultimate failure

!16714 = distinct !DISubprogram(name: "sparse_matmul_AtransBtrans", linkageName: "_Z26sparse_matmul_AtransBtransRKSt6vectorIS_ISt4pairImfESaIS1_EESaIS3_EES7_", scope: !11478, file: !11478, line: 341, type: !11479, scopeLine: 343, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !3617, retainedNodes: !16715)
!17856 = !DILocation(line: 7904, scope: !16708)
!16708 = distinct !DISubprogram(name: "_Z26sparse_matmul_AtransBtransRKSt6vectorIS_ISt4pairImfESaIS1_EESaIS3_EES7_", scope: !3623, file: !3623, line: 7169, type: !3645, scopeLine: 7169, spFlags: DISPFlagDefinition, unit: !3622, retainedNodes: !128)
fatal error: error in backend: Broken module found, compilation aborted!
clang-9: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 9.0.1 
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /srv/data_local/packages/spack/opt/spack/linux-rhel8-cascadelake/clang-8.0.1/llvm-9.0.1-qjkzbvsgygpilbbfq7dhdpyavmr2zoh4/bin
clang-9: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
clang-9: note: diagnostic msg: Error generating preprocessed source(s) - no preprocessable inputs.

benroywillis commented 3 years ago

I receive the following message when trying to clone your repository:

Cloning into 'spmm'... ERROR: Repository not found. fatal: Could not read from remote repository.

Please make sure you have the correct access rights and the repository exists.

I can't see the spmm repo on your github so I suspect it is private. I haven't received any notification of invitation to your repo, maybe a simple mistake somewhere like a typo?

wohlbier commented 3 years ago

It is a private repository, but I had sent you an invite. I still show the invite as pending. Did it go to your spam maybe?

benroywillis commented 3 years ago

I've gotten access to the repo and adapted it to my system.

The changes I have made to TraceAtlas seem to have fixed the issue you mentioned here. These changes are in the dev branch (the current pull request branch), I think you have access to it (the swapped bitcode file has invalid debug info in it, but LLVM just prints about it and ignores the errors).

I have run into a new problem though: the swap executable doesn't terminate in a timely fashion (I killed the process after about 20 minutes).

My first step will be to get the debug info to work, then I will be able to see which part of the swapped program is causing problems.

I am developing the tik and tikswap tools this quarter, so these problems are likely to be ongoing for the next few weeks. In our corpus, we currently have about 3% compliance (that is, out of about 250k kernels, about 7.5k are profiled, segmented, tik'd, swapped, and run successfully)

wohlbier commented 3 years ago

When you get a chance could you have another look at the reproducer? I had to make some changes for code correctness. Now tik fails.

benroywillis commented 3 years ago

Sorry for the late response. Despite what I said in my last response, I have been focusing my time this quarter on developing the segmentation algorithm and new memory profiling tools.

I spent some time on tik last month. In short, it cannot support the output of the new segmentation algorithm with its current approach. The problem lies in tik's approach: it attempts to turn each and every kernel into a function. One problem I have found with this approach, which was especially prevalent in the spmm app had to do with context levels. Whenever the MLE kernel algorithm finds a kernel whose entrance/exit edges lie on different context levels, this approach becomes very challenging for compilation and correctness.

After some debugging and development with your application, I made this realization and decided to spend the quarter generating results not related to tik. We are still planning on building a tool that can extract the kernels we found and make them available for code swapping/high level synthesis/optimization. Likely ontology will move forward with a tool that is different in its approach than tik.

For the latest version of the ontology tools, check out the benroywillis/TraceAtlas/devb branch.

I have pushed a branch to the spmm repository called dash_brw. In that branch is a directory called dash_build, which contains a Makefile that facilitates the entire TraceAtlas toolchain. The "all" rule runs the profiling, segmentation, and memory analysis tools. So far I have been able to segment the spmm app, but the memory profiling tools are slow, and haven't finished after several hours. I will update on here if/when they finish.

Let me know if you have any questions, I will be responsive either here or via email (I've turned notifications on for the proceedings this repository has in the issues, so I will get back to you in a more timely manner going forward).

Dr. Brunhaver, Dr. Chakrabarti and I are in the process of submitting a paper to ISCA 2022, which is due November 23rd. Once that submission is made I will finally have a paper ready for you as reference to what is going on at Ontology.

wohlbier commented 3 years ago

Thanks very much for the detailed follow up! I'm not working on this at the moment, but I will be interested in reading the paper when it's ready!

benroywillis commented 3 years ago

The memory passes have been running without completing yet for about 100h straight. I tried setting the TESTING macro to force the program to use the "small" input, but that only decreased the length of the dynamic tracing step by about 10% (from 970s down to 900s). Based on that, I expect the memory passes to take an intractible amount of time. Do you have a smaller input that I can use, or perhaps a reference to generating a smaller input?

benroywillis / TraceAtlas

Reproducer that fails to create swap.exec #14