Closed cmelone closed 2 months ago
Make me a reproducer on sapling.
If possible, build the reproducer with -DDEBUG_LEGION_GC -DLEGION_GC
but only if it still reproduces.
Hitting a different assertion with those flags:
prometeo_CH41StMix.exec: /home/cmelone/april/legion/runtime/legion/garbage_collection.cc:475: bool Legion::Internal::DistributedCollectable::remove_base_gc_ref_internal(Legion::Internal::ReferenceSource, int): Assertion `finder != detailed_base_gc_references.end()' failed.
Reproducer is at /home/cmelone/april
. Run REBUILD=0 ./run.sh
to submit the slurm job
Try again with this branch: https://gitlab.com/StanfordLegion/legion/-/merge_requests/1201 It's not going to fix the issue, but it will fix a false-positive that you're getting right now in the reference checking code.
Thanks, that assertion is now gone. I was having trouble reproducing the issue on Sapling, but it turns out there is only about 1/10 chance of hitting the original error, so I updated the script to submit 10 jobs at a time.
The error still reproduces with the CXXFLAGS
you requested
@elliottslaughter please add to #1032, thanks!
Where is your Legion source code and how do I rebuild if I change something?
Legion is at ~/april/legion
and can be rebuilt by running REBUILD=1 ./run.sh
(this won't recompile HTR)
Try the most recent Legion master branch.
Looks great, thanks Mike!
I'm hitting this error on certain test cases: GPU, 1 node, only in debug mode. Latest
master
.bt: