Open wohlbier opened 3 years ago
I receive the following message when trying to clone your repository:
Cloning into 'spmm'... ERROR: Repository not found. fatal: Could not read from remote repository.
Please make sure you have the correct access rights and the repository exists.
I can't see the spmm repo on your github so I suspect it is private. I haven't received any notification of invitation to your repo, maybe a simple mistake somewhere like a typo?
It is a private repository, but I had sent you an invite. I still show the invite as pending. Did it go to your spam maybe?
I've gotten access to the repo and adapted it to my system.
The changes I have made to TraceAtlas seem to have fixed the issue you mentioned here. These changes are in the dev branch (the current pull request branch), I think you have access to it (the swapped bitcode file has invalid debug info in it, but LLVM just prints about it and ignores the errors).
I have run into a new problem though: the swap executable doesn't terminate in a timely fashion (I killed the process after about 20 minutes).
My first step will be to get the debug info to work, then I will be able to see which part of the swapped program is causing problems.
I am developing the tik and tikswap tools this quarter, so these problems are likely to be ongoing for the next few weeks. In our corpus, we currently have about 3% compliance (that is, out of about 250k kernels, about 7.5k are profiled, segmented, tik'd, swapped, and run successfully)
When you get a chance could you have another look at the reproducer? I had to make some changes for code correctness. Now tik
fails.
Sorry for the late response. Despite what I said in my last response, I have been focusing my time this quarter on developing the segmentation algorithm and new memory profiling tools.
I spent some time on tik last month. In short, it cannot support the output of the new segmentation algorithm with its current approach. The problem lies in tik's approach: it attempts to turn each and every kernel into a function. One problem I have found with this approach, which was especially prevalent in the spmm app had to do with context levels. Whenever the MLE kernel algorithm finds a kernel whose entrance/exit edges lie on different context levels, this approach becomes very challenging for compilation and correctness.
After some debugging and development with your application, I made this realization and decided to spend the quarter generating results not related to tik. We are still planning on building a tool that can extract the kernels we found and make them available for code swapping/high level synthesis/optimization. Likely ontology will move forward with a tool that is different in its approach than tik.
For the latest version of the ontology tools, check out the benroywillis/TraceAtlas/devb branch.
I have pushed a branch to the spmm repository called dash_brw. In that branch is a directory called dash_build, which contains a Makefile that facilitates the entire TraceAtlas toolchain. The "all" rule runs the profiling, segmentation, and memory analysis tools. So far I have been able to segment the spmm app, but the memory profiling tools are slow, and haven't finished after several hours. I will update on here if/when they finish.
Let me know if you have any questions, I will be responsive either here or via email (I've turned notifications on for the proceedings this repository has in the issues, so I will get back to you in a more timely manner going forward).
Dr. Brunhaver, Dr. Chakrabarti and I are in the process of submitting a paper to ISCA 2022, which is due November 23rd. Once that submission is made I will finally have a paper ready for you as reference to what is going on at Ontology.
Thanks very much for the detailed follow up! I'm not working on this at the moment, but I will be interested in reading the paper when it's ready!
The memory passes have been running without completing yet for about 100h straight. I tried setting the TESTING macro to force the program to use the "small" input, but that only decreased the length of the dynamic tracing step by about 10% (from 970s down to 900s). Based on that, I expect the memory passes to take an intractible amount of time. Do you have a smaller input that I can use, or perhaps a reference to generating a smaller input?
I have given you access to a reproducer which fails to create
swap.exec
. To reproduce:The step before creating
swap.exec
And the ultimate failure