MFlowCode / MFC

Exascale simulation of multiphase/physics fluid dynamics
https://mflowcode.github.io
MIT License
132 stars 56 forks source link

Robust floating point comparisons #495

Closed ChrisZYJ closed 2 days ago

ChrisZYJ commented 5 days ago

Description

This PR completes #482 by replacing all == dflt_real and /= dflt_real with f_is_default().

It creates a new module named m_helper_basic to prevent circular dependency (the function is used within m_global_parameters. Please let me know if this is a good approach or can be changed.

Type of change

Scope

How Has This Been Tested?

Test Configuration:

Checklist

If your code changes any code source files (anything in src/simulation)

To make sure the code is performing as expected on GPU devices, I have:

sbryngelson commented 5 days ago

This is really nice @ChrisZYJ

sbryngelson commented 5 days ago

This PR fails benchmarking because the benchmark case (viscous sgb) actually NaNs. This issue appears the same as the one documented in #396. I suspect this is showing up now due to a change in the compilers or some other environment variable on Phoenix. It exits with a NaN when attempting to write the first output file (which I suspect is when the code checks for NaNs). This benchmark case might just be broken.

@ChrisZYJ you can just run this case on Delta outside of CI and watch it fail. We should be able to fix it.

I do find it odd that when looking at the logs, only the PR NaNed. It's a strange issue.

ChrisZYJ commented 5 days ago

@sbryngelson Really sorry for the mistake in m_monopole which escaped all the tests. Hopefully this fixes the problem

sbryngelson commented 4 days ago

Thanks @ChrisZYJ! No apology is needed; this just means we need better tests (we always do, but having codecov definitely helps). Can you add a few tests that would check monopole "better"?

Also looks like tests are still failing... maybe my original hypothesis was correct as well?

ChrisZYJ commented 4 days ago

Yeah you're definitely right. I just tried that benchmark on Delta and saw the same problem as #396. I've double checked this PR again and everything should be right, so I'm really not sure what's going on. On a side note, numerous tests fail when I run ./mfc.sh test -g 4 -- '-cdelta' with GPU, starting from 301B9153 (all tests are passed without the -g 4 option. I've also tested on v4.9.1 and same thing happens. This may or may not be related.

I'll definitely add more tests for monopole, probably in another PR. I'm actually working on more monopole options, so maybe they can come together.

sbryngelson commented 4 days ago

Ok, I would like to hold off merging PRs until benchmarking is running. This issue you witness is not Delta-specific (I don't think). If you check the issue, I tried different compiler versions and computers and GPUs. I think it's some environment thing. But we should be able to track it down. For example, do the tests actually NaN or not? If not, why are they failing?

I'm ok with holding off on more monopole tests until another PR.

ChrisZYJ commented 4 days ago

Thanks for the clarifications! If there are any specific ways I can assist in resolving the problem, please let me know. I'll also add the monopole tests after benchmarking is working

sbryngelson commented 3 days ago

Thanks for the clarifications! If there are any specific ways I can assist in resolving the problem, please let me know. I'll also add the monopole tests after benchmarking is working

This issue has been open for a couple of months so feel free to take a crack at it.

sbryngelson commented 3 days ago

@ChrisZYJ A PR after yours is passing the GPU benchmark (https://github.com/MFlowCode/MFC/pull/496). I think you have a bug in your code that is causing problems on some GPU cases on Frontier and Phoenix (V100 machine).

Still, issue #396 needs to be resolved, though it appears separate.

ChrisZYJ commented 3 days ago

@sbryngelson I fixed the benchmark problem in #497. I'm merging it into this PR and this PR should then pass the GPU benchmark.

sbryngelson commented 3 days ago

@ChrisZYJ nice - I commented on the other PR for a minor change. Once they pass tests and that case is fixed I'll merge them both.