adamantine-sim / adamantine

Software to simulate heat transfer for additive manufacturing
https://adamantine-sim.github.io/adamantine/
Other
32 stars 9 forks source link

Fix bug with ghosts zeroed out after adding material #217

Closed stvdwtt closed 1 year ago

stvdwtt commented 1 year ago

Fixes #212 by switching from using dealii::VectorOperation::min to dealii::VectorOperation::insert, as in: https://github.com/dealii/dealii/pull/14340

This PR also changes the 3D integration test to an MPI test to catch a similar issue in the future.

CI

Rombur commented 1 year ago

Looks like it there is a problem https://cloud.cees.ornl.gov/jenkins-ci/blue/organizations/jenkins/adamantine/detail/adamantine/347/tests

Rombur commented 1 year ago

We use signaling_NaN() only in MaterialProperties so it's probably not coming from there.

stvdwtt commented 1 year ago

It's looking like the issue with test_material_deposition isn't related to adding material. The solution is all NANs after the first evolve_one_time_step call is made. I'm looking for the cause of that now.

stvdwtt commented 1 year ago

It turns out that test_material_deposition was using radiative BCs but didn't have the appropriate material parameters defined. I'm guessing that the solution has been NANs all along, but it wasn't until the add_material fix that it was exposed. I changed the BCs to adiabatic.

New CI

stvdwtt commented 1 year ago

@Rombur, in the most recent CI run, test_thermal_operator_1 failed in Release. Nothing changed relevant to that test since my last push. I've occasionally seen this happen on my laptop, where randomly this test fails and then passes on a rerun.

Should we just loosen the tolerance? The values are all close. Is there a chance that something is actually wrong?

Rombur commented 1 year ago

I don't want to loosen the tolerance right now because I would like to see how often this happens. In release mode, I have compiled deal.II with -Ofast which allows the compiler more freedom to reorder some mathematical operations. Before we loosen the tolerance, I would like to see if switching to -O3 fixes the problem. It's also possible that we read an uninitialize value somewhere. We should try to use signaling_NaN more often in debug mode that's the easiest way to find this kind of bug until we get the sanitizers back. For now, I'll restart the CI.