Closed xyuan closed 4 years ago
It's going to be a few days before I can get to this. I have to get Summit GPU working again after an E3SM merge that broke it, commit those to master though PR, and then test these changes to ensure it's also not breaking anything. Sorry for the delay in getting to this.
@xyuan, I'm very confused by this PR. First of all, the name should reflect what the PR is fixing or adding. And other than the machine setup files, the files you are changing are really outside the scope of this project and could create merge problems down the road. Modifying external submodules files is something that should absolutely be avoided.
We are trying to coordinate a fix for the Summit machine configuration that addresses the GPU capability, which is the central focus of this project. I'd like to request that you close this PR, and we should chat about what your role is for this project so we can better coordinate how your work lines up with what the rest of the group is doing.
I told @xyuan he should feel free to fix anything preventing compilation on Summit. That's his platform until the A21 compiler stack becomes useful.
A machine fix PR should be limited to machine/compiler config changes.
@whannah1 this doesn't look like external submodule files, this looks like the interface code, otherwise it wouldn't show up as diffs here but rather newly added files. Regardless I think those changes should probably be made upstream in E3SM. The compiler fixes should definitely be made here though, since we still do not have Summit PGI tests upstream in E3SM.
@brhillman, good point about the submodule files, my bad.
ok, it looks like I jumped the gun closing this. After chatting about it more I realized these aren't as problematic as I thought and they are necessary given that we want to avoid E3SM merges until after INCITE time is burned.
@xyuan please edit the title of this PR to something specific, like "Fix summit builds" or whatever you think is appropriate.
@xyuan please edit the title of this PR to something specific, like "Fix summit builds" or whatever you think is appropriate.
Thanks, I just did. Sorry for the comments which the sources of confusion.
It might be better to discuss this topic during one of our calls to avoid unwarranted misunderstandings/arguments.
@xyuan Ideally, it is preferable for changes to Summit machine configuration (config_compilers/machines) to percolate down from E3SM master wherever possible.
That said, the compiler options specific to this repo should be captured in the pgigpu section for Summit in config_compilers. This would require coordination among people modifying Summit machine config (Matt on ACME-ECP end; Min, Danqing, Az, and me on E3SM side).
For the main repo (E3SM), we would like to specify particular versions of modules/libraries in the machine file which was omitted for initial development in ACME-ECP repo. e.g., https://github.com/E3SM-Project/ACME-ECP/blob/master/cime/config/e3sm/machines/config_machines.xml#L3033
I think we should be specific now for provenance/debugging reasons now that things are relatively stable.
I will leave specific comments about config_compiler changes.
so I closed this PR, and wait for more coordination.
This PR fixes some build errors on summit machine to support the ibm xlf, pgi, pgigpu and gnu compiler, which includes
[BFB]