CEED / libCEED

CEED Library: Code for Efficient Extensible Discretizations
https://libceed.org
BSD 2-Clause "Simplified" License
198 stars 47 forks source link

Must Destroy OperatorField Objects #1646

Open jeremylt opened 1 month ago

jeremylt commented 1 month ago

Fixes #1639

What do we think @jrwrigh. You can see how this does get a bit invasive.


Ok, now the Vector/ElemRestriction/Basis retrieved from the OperatorField needs to be Destroyed per the corresponding interface.

jeremylt commented 1 month ago

ToDo: Track down all the CPU memory leaks, add GPU restores, language wrapers

jeremylt commented 1 month ago

Ok, rebased to work off the branch with the lingering Operator memory fix. Impl for CPU ref is in, rest in progress

jeremylt commented 1 month ago

CPU blocked and ref in, GPU non-gen next. (would need to rebase in another open PR to do gen)

OCCA will be annoying so I'm saving it for last

jeremylt commented 3 weeks ago

Ok, all backends should be covered, including SYCL (not tested)

The ToDo is handling this for Rust

jeremylt commented 2 weeks ago

Working on Rust, need to fix error handing though

jeremylt commented 2 weeks ago

Ok, should be good to review now

jrwrigh commented 2 weeks ago

I've pushed up more fixes to jrwrigh/get-restore, but there are still a lot of failures to do with the assembly. The failing tests are:

Test: t537-operator
Test: t569-operator
Test: t537-operator
Test: t569-operator
Test: t536-operator-f
Test: t533-operator
Test: t506-operator
Test: t534-operator-f
Test: t536-operator
Test: t570-operator
Test: t534-operator
Test: t533-operator-f
Test: t535-operator-f
Test: t535-operator
Test: t538-operator
Test: t506-operator-f

t570-operator fails with:

/home/jrwrigh/software/libCEED/backends/sycl-ref/ceed-sycl-ref-operator.sycl.cpp:1086 in CeedSingleOperatorAssembleSetup_Sycl(): Cannot assemble operator without inputs/outputs

where it's num_eval_mode_out = 0 that is triggering the failure. t504-operator fails with:

Large Problem, Component 1: Computed Area 1.254524 != True Area 1.0

The rest are correctness failures in assembly.

I'd try limiting the SYCL changes to the absolute minimum required to get this PR merged.

jeremylt commented 2 weeks ago

Those changes in that branch look good, so feel free to push them to this branch

jeremylt commented 1 week ago

@jrwrigh I think I put back the pieces that were accidentally lost

jrwrigh commented 1 week ago

Needed to apply the following diff to compile:

diff --git i/backends/sycl-ref/ceed-sycl-ref-operator.sycl.cpp w/backends/sycl-ref/ceed-sycl-ref-operator.sycl.cpp
index 9d2ecec6..a784e1e5 100644
--- i/backends/sycl-ref/ceed-sycl-ref-operator.sycl.cpp
+++ w/backends/sycl-ref/ceed-sycl-ref-operator.sycl.cpp
@@ -662,7 +662,7 @@ static inline int CeedOperatorAssembleDiagonalSetup_Sycl(CeedOperator op) {
       CeedBasis           basis;

       CeedCallBackend(CeedOperatorFieldGetElemRestriction(op_fields[i], &elem_rstr));
-      CeedCheck(!rstr_in || rstr_in == rstr, ceed, CEED_ERROR_BACKEND,
+      CeedCheck(!rstr_in || rstr_in == elem_rstr, ceed, CEED_ERROR_BACKEND,
                 "Backend does not implement multi-field non-composite operator diagonal assembly");
       if (!rstr_in) CeedCallBackend(CeedElemRestrictionReferenceCopy(elem_rstr, &rstr_in));
       CeedCallBackend(CeedOperatorFieldGetBasis(op_fields[i], &basis));

But I'm still seeing correctness failures for:

t535-operator
t536-operator
t537-operator
t533-operator
t533-operator-f
t506-operator
t536-operator-f
t535-operator-f
t506-operator-f
t538-operator
t534-operator-f
t537-operator
t534-operator

All assembly errors though, so you managed to squash some of the issues.

jeremylt commented 1 week ago

I'll keep plunking away, thanks for the cross-check!

jeremylt commented 1 week ago

t506 is the odd one, as that's using the same QF for 2 different CeedOperators

jeremylt commented 1 week ago

Note

make prove -j BACKENDS="/gpu/sycl/ref /gpu/sycl/gen" PROVE_OPTS=-v

will give the details on the failures for just those

jrwrigh commented 1 week ago

Fixed t506, but the others are still failing: sorted this time

t533-operator
t533-operator-f
t534-operator
t534-operator-f
t535-operator
t535-operator-f
t536-operator
t536-operator-f
t537-operator
t537-operator
t537-operator
t538-operator

They fail with all the SYCL backends (ref, shared, and gen). Not too surprising since the assembly always goes through the ref backend, but just another sanity check.

jeremylt commented 1 week ago

t506 better is good. That narrows my search. Just operator diagonal left to dig into

Its the same routine for all of them, so now you only need to look at /gpu/sycl/ref

jeremylt commented 1 week ago

Found another difference. Any useful error messages in the test suite output with PROVE_OPTS=-v or is it just "entry x is different" sort of stuff?

jrwrigh commented 1 week ago

I've been running with make junit to get the error messages. But yeah, they're just:

[0] Error in assembly: 0.000000 != 0.002963
[1] Error in assembly: 0.000000 != 0.011852

or similar.

jeremylt commented 1 week ago

Interesting, I wouldn't have expected getting 0s

jrwrigh commented 1 week ago

Yeah, all the errors are in the form Error in assembly: 0.0000 != ....

jeremylt commented 1 week ago

Huzzah, at long last. I think this is good to add now, and I can spin up the companion PR to MFEM. Mildly disruptive but not a huge deal as it just leaks if not fixed downstream

jeremylt commented 2 days ago

1673 takes priority. Will rework this branch after that merges