Closed mrnorman closed 6 years ago
I will mention that there are still some automatic Fortran arrays scattered throughout, and they aren't exactly small. However, this takes care of the vast majority of the data the CRM uses.
Cool :). Do you want me to integrate today?
Matt, Chris is already working on the integration.
Word
@mrnorman @whannah1 My integration attempt failed. Both FSP1V1-TEST and FSP2V1-TEST seg-faulted at first timestep. There were also NLCOMP diffs for all tests, and they all failed the "BASELINE master" test.
There were merge conflicts. I've updated the attempted merge to branch crjones/crm/allocate-merge in case you want to see if I made any clear mistakes in resolving conflicts.
@crjones-amath , did it crash for gnu, pgi, or intel? (or all of them)? I'm working on this now on my laptop, hoping to track it down quickly
@mrnorman For me it crashed with intel on edison. @whannah1 mentioned to me that his debug run on titan crashed as well, presumably with pgi.
@whannah1 , @crjones-amath I'm getting the following error: "kurant() - the number of cycles exceeded 4." This usually means it's just a wrong answer. My gut tells me this has to do with data persistence / initialization. The data inherently persists in the original code, and it doesn't when we allcoate every time. So I'm going to start everything out at zero and see if that fixes the problem (since I do that in the GPU code anyway). Hopefully I don't have to track down any issues with data that the model assumed kept its previous value (which would inherently be a bug, and we already fixed a few of those).
Also, this might explain why my standalone model works fine but full ACME-MMF crashes
My recent run finally gave a FPE error and core file that indicates line 154 in src/physics/crm/diagnose.F90 I don't see an obvious problem with any of those variables yet though.
Core files usually mean segfault. And I thought FPEs only kill a simulation if you turn on FP traps, right?
Ugh, no dice. Build times are < 1min on my laptop, so I'll just re-do the work while testing full ACME the whole time.
My run was in debug mode, so it was set up to catch FPEs.
Killing this PR and creating a new one based off the branch I just created
I've gone through all of the major data and made everything allocated and deallocated every time the CRM is called rather than leaving the data essentially static with automatic Fortran arrays. For PGI with the OpenACC port, this relieved some wrong answer bugs I was seeing that were very hard to track down. Hopefully this will help the robustness of the CRM with PGI for master as well.
I also change nvcols to ncrms and vc to icrm so that it is more clear.
I don't have time to check with the full ACME-ECP code, but this works successfully with the standalone code. @whannah1 , please check with 1mom and 2mom micro if you get the chance.