Closed Kuree closed 4 years ago
@Kuree I cannot reproduce this locally, but it looks like an error in deleting division instructions that I simplify into shifts. Do you this first build where this error happened?
*do you remember the first build where this happened?
This is the first build that has the div
issue: https://buildkite.com/stanford-aha/garnetflow/builds/1476#fddb505e-d74c-4c86-82a5-aae72b9f399f
@Kuree I cannot reproduce the issue on my garnetflow build either. That said I think it is caused by use of an unordered set of pointers in peephole divide optimizations. I have inserted a fix into Halide-to-Hardware master.
https://github.com/StanfordAHA/Halide-to-Hardware/pull/82
Can we change garnetflow to build Halide-to-Hardware from source instead of downloading a tagged library? The build already takes hours to run anyway and it would ensure the flow is up to date and hopefully we would catch some of these issues earlier.
We can do that for daily regressions. I will work on a fix to enable the build from source if not PR.
On a side note, regarding the build speed I need to track down where it is slowing one.
I have isolated the cause for 2x slow down in the regression. I will fix this after the performance bug is fixed.
@dillonhuff https://buildkite.com/stanford-aha/garnetflow/builds/1512#a5a8cc27-711b-4ad0-b7ac-9b5294aa44e8 This build failed even though it built halide from scratch. Can you take a look?
@Kuree this looks like a different error. The generator is segfaulting on cacade and the printouts stop very early. I'll try to recreate it in the travis tb.
@Kuree I cannot get cascade to fail in garnetflow, but I can get a unit test of cascade to crash early in compilation in the garnetflow build on travis:
https://travis-ci.com/StanfordAHA/Halide-to-Hardware/builds/150709481
The error seems to happen before getting to coreir generation. @jeffsetter have you ever seen this error?
I have never encountered that type of error.
Is it failing cascade? Because near the end of the log it says "Generating coreir for function coreir_cascade" which suggests to me that it does get to coreir generation for cascade.
@jeffsetter @Kuree sorry I wasnt clear. The build I linked to starts with a unit test that builds the cascade app, which runs to completion, passes, and then moves on to the rest of the unit tests. The rest of the unit tests contain another test with two convolutions back to back (which I also call cascade), that one fails with:
No linebuffer inserted after function conv2.
terminate called after throwing an instance of 'std::domain_error'
what(): type must be number, but is string
./test/scripts/run_hw_unit_tests.sh: line 18: 14079 Aborted (core dumped) ./all-tests
Extracting testbench files...
Which seems to be before coreir code generation. I cannot get the cascade app itself to fail either on travis or locally.
@dillonhuff To reproduce the cascade bug, can you attach to the docker container keyi-debug-flow
on kiwi?
@Kuree @joyliu37 when I run cascade in that container the code crashes inside the unified buffer rewrites (which now run on each execution of Halide-to-Hardware). It seems that the crash is here:
Joey do you have any idea why this would be crashing on kiwi?
There is a typo in gaussian, and it should be fixed in the newest PR.
See https://buildkite.com/stanford-aha/garnetflow/builds/1476#fddb505e-d74c-4c86-82a5-aae72b9f399f/6-10334
This is using the latest release: https://github.com/StanfordAHA/Halide-to-Hardware/releases/tag/lakelib