Open teabagk7 opened 3 years ago
@teabagk7 Thanks for reporting this, I've forwarded to the developer and will update ASAP.
@teabagk7 The 256 mesh, 6144-rank run tests out on our system with the reference commit (see the Nalu README for hashes). We will accept results from different commits, since we recognize how much work is required to generate the results you already have. We suggest building the older version of the code to generate the 256 mesh 6144-rank results.
I've built exact the same hashes of Trilinos and Nalu. This problem appears only on mesh 256 test with 96 nodes.
I didn't mention a Trilinos hash. The two hashes we mention in the README are of Nalu code. Runs at 6144 ranks and the 256 mesh run to completion on our reference hardware.
What Nalu hash are you working with?
Nalu-Wind Version: v1.2.0
Nalu-Wind GIT Commit SHA: c7c3723261cf1eebe73ef969396d08d342a01644-DIRTY
Trilinos Version: 13.1-g53550bee94b
TPLs: Boost, HDF5, netCDF, STK, Trilinos, yaml-cpp and zlib
Try Nalu-Wind commit 1d3ee2e62ecdd4745d0339a5bf9c5194a07bc93a for the 256 mesh, 6144-rank test.
Try Nalu-Wind commit 1d3ee2e62ecdd4745d0339a5bf9c5194a07bc93a [...]
[gerardo@login01 build-test]$ git checkout 1d3ee2e62ecdd4745d0339a5bf9c5194a07bc93a fatal: reference is not a tree: 1d3ee2e62ecdd4745d0339a5bf9c5194a07bc93a
[cchang@el1 cchang]$ git clone https://github.com/Exawind/nalu-wind.git Cloning into 'nalu-wind'... remote: Enumerating objects: 69, done. remote: Counting objects: 100% (69/69), done. remote: Compressing objects: 100% (56/56), done. remote: Total 25671 (delta 22), reused 36 (delta 13), pack-reused 25602 Receiving objects: 100% (25671/25671), 17.46 MiB | 14.71 MiB/s, done. Resolving deltas: 100% (20518/20518), done. [cchang@el1 cchang]$ cd nalu-wind/ [cchang@el1 nalu-wind]$ git checkout 1d3ee2 Note: checking out '1d3ee2'.
You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may do so (now or later) by using -b with the checkout command again. Example:
git checkout -b new_branch_name
HEAD is now at 1d3ee2e... Updating golds in response to #692.
Thank you. I was using 'git clone https://github.com/exawind/build-test.git', which I got from Step 4 of https://nalu-wind.readthedocs.io/en/latest/source/user/build_spack.html
OK, thanks @gcstoianowski . I'll forward to the benchmark steward to see if we can't clarify the instructions on our end a bit.
what(): 107: <....>/Trilinos_2/packages/zoltan2/core/src/problems/Zoltan2_PartitioningSolution.hpp,1572 107: error: Value for num_global_parts is different on different processes
192, 384,768, 1536, 3072 - works fine, no such error.
mesh 512 works on 768, 1536, 3072 and 6144!