E3SM-Project / ACME-ECP

E3SM MMF for DoE ECP project
Other
9 stars 1 forks source link

Xyuan/summit/crm #107

Closed xyuan closed 4 years ago

xyuan commented 4 years ago

This PR fixes some build errors on summit machine to support the ibm xlf, pgi, pgigpu and gnu compiler, which includes

[BFB]

mrnorman commented 4 years ago

It's going to be a few days before I can get to this. I have to get Summit GPU working again after an E3SM merge that broke it, commit those to master though PR, and then test these changes to ensure it's also not breaking anything. Sorry for the delay in getting to this.

whannah1 commented 4 years ago

@xyuan, I'm very confused by this PR. First of all, the name should reflect what the PR is fixing or adding. And other than the machine setup files, the files you are changing are really outside the scope of this project and could create merge problems down the road. Modifying external submodules files is something that should absolutely be avoided.

We are trying to coordinate a fix for the Summit machine configuration that addresses the GPU capability, which is the central focus of this project. I'd like to request that you close this PR, and we should chat about what your role is for this project so we can better coordinate how your work lines up with what the rest of the group is doing.

rljacob commented 4 years ago

I told @xyuan he should feel free to fix anything preventing compilation on Summit. That's his platform until the A21 compiler stack becomes useful.

whannah1 commented 4 years ago

A machine fix PR should be limited to machine/compiler config changes.

brhillman commented 4 years ago

@whannah1 this doesn't look like external submodule files, this looks like the interface code, otherwise it wouldn't show up as diffs here but rather newly added files. Regardless I think those changes should probably be made upstream in E3SM. The compiler fixes should definitely be made here though, since we still do not have Summit PGI tests upstream in E3SM.

whannah1 commented 4 years ago

@brhillman, good point about the submodule files, my bad.

whannah1 commented 4 years ago

ok, it looks like I jumped the gun closing this. After chatting about it more I realized these aren't as problematic as I thought and they are necessary given that we want to avoid E3SM merges until after INCITE time is burned.

brhillman commented 4 years ago

@xyuan please edit the title of this PR to something specific, like "Fix summit builds" or whatever you think is appropriate.

xyuan commented 4 years ago

@xyuan please edit the title of this PR to something specific, like "Fix summit builds" or whatever you think is appropriate.

Thanks, I just did. Sorry for the comments which the sources of confusion.

sarats commented 4 years ago

It might be better to discuss this topic during one of our calls to avoid unwarranted misunderstandings/arguments.

@xyuan Ideally, it is preferable for changes to Summit machine configuration (config_compilers/machines) to percolate down from E3SM master wherever possible.

That said, the compiler options specific to this repo should be captured in the pgigpu section for Summit in config_compilers. This would require coordination among people modifying Summit machine config (Matt on ACME-ECP end; Min, Danqing, Az, and me on E3SM side).

For the main repo (E3SM), we would like to specify particular versions of modules/libraries in the machine file which was omitted for initial development in ACME-ECP repo. e.g., https://github.com/E3SM-Project/ACME-ECP/blob/master/cime/config/e3sm/machines/config_machines.xml#L3033

I think we should be specific now for provenance/debugging reasons now that things are relatively stable.

I will leave specific comments about config_compiler changes.

xyuan commented 4 years ago

so I closed this PR, and wait for more coordination.