E3SM-Project / ACME-ECP

E3SM MMF for DoE ECP project
Other
9 stars 1 forks source link

update machine configuration for summit #106

Closed xyuan closed 4 years ago

xyuan commented 4 years ago

update the machine configuration xml file for latest project setup.

xyuan commented 4 years ago

pgi fortran on summit build with acme-ecp has netcdf configuration issue as well, it need to be fixed tomorrow.

whannah1 commented 4 years ago

We have some additional summit fixes that Matt is working that are specific to the CPU MMF runs, but it looks like there's no conflict with the changes here.

sarats commented 4 years ago

@xyuan Curious, some of these should be in E3SM master. Won't they be brought in through a regular merge?

whannah1 commented 4 years ago

@sarats, we just did a merge with E3SM, are these additions in E3SM very recent?

xyuan commented 4 years ago

The acme-ecp master is out of date, but the e3sm master branch sounds catching up the latest changes.


From: Sarat Sreepathi notifications@github.com Sent: Friday, September 27, 2019 4:05:18 PM To: E3SM-Project/ACME-ECP ACME-ECP@noreply.github.com Cc: Yuan, Xingqiu xyuan@anl.gov; Mention mention@noreply.github.com Subject: Re: [E3SM-Project/ACME-ECP] update machine configuration for summit (#106)

@xyuanhttps://github.com/xyuan Curious, some of these should be in E3SM master. Won't they be brought in through a regular merge?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/E3SM-Project/ACME-ECP/pull/106?email_source=notifications&email_token=AAFCG2AW3M6FYZ5PUKLVZDLQLZRP5A5CNFSM4I3A5TZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7Z6VVQ#issuecomment-536079062, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAFCG2EURZ4M2HM6I6UE5HDQLZRP5ANCNFSM4I3A5TZQ.

whannah1 commented 4 years ago

@xyuan, we have to be careful about trying to keep up with E3SM. There have been many times where we needed to pull things down from E3SM, and ideally we would be syncing more frequently, but the merge process has often created conflicts that were difficult to resolve. So we try to limit these updates to when we have a critical need for them. The ECP fork will often be behind E3SM, and that's ok.

xyuan commented 4 years ago

@Hannah, Walter Michaelmailto:hannah6@llnl.gov, thanks. I understand your point.


From: Walter Hannah notifications@github.com Sent: Friday, September 27, 2019 4:20 PM To: E3SM-Project/ACME-ECP ACME-ECP@noreply.github.com Cc: Yuan, Xingqiu xyuan@anl.gov; Mention mention@noreply.github.com Subject: Re: [E3SM-Project/ACME-ECP] update machine configuration for summit (#106)

@xyuanhttps://github.com/xyuan, we have to be careful about trying to keep up with E3SM. There have been many times where we needed to pull things down from E3SM, and ideally we would be syncing more frequently, but the merge process has often created conflicts that were difficult to resolve. So we try to limit these updates to when we have a critical need for them. The ECP fork will often be behind E3SM, and that's ok.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/E3SM-Project/ACME-ECP/pull/106?email_source=notifications&email_token=AAFCG2DEQX22EOWODNONUJLQLZTJ5A5CNFSM4I3A5TZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7Z7XVY#issuecomment-536083415, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAFCG2GBLENZL4IYTULBT4TQLZTJ5ANCNFSM4I3A5TZQ.

sarats commented 4 years ago

@whannah1 The path changes from csc190 to cli115 should be a long time ago. See https://github.com/E3SM-Project/E3SM/blame/master/cime/config/e3sm/machines/config_machines.xml#L2965

sarats commented 4 years ago

Some path updates 8 months ago, recent commit to master on June 6: https://github.com/E3SM-Project/E3SM/commit/55772a4409368c31b701428b7dd978261848b608

whannah1 commented 4 years ago

@sarat, I guess those changes were overrided by all the changes we've made to the summit config to get the GPU runs working. It might be worth adding a "summit-ecp" machine to avoid these things clashing in the future.

sarats commented 4 years ago

There should be a lot of overlap. I'm not sure that ACME-ECP and E3SM have different module or other requirements. The requisite changes should mostly be captured in config_compilers where there is a pgigpu section for Summit.

rljacob commented 4 years ago

ACME-ECP is the only version of ACME/E3SM running on summit so the summit config should be what you need.

sarats commented 4 years ago

@rljacob I agree but we do certain benchmarking runs with E3SM on Summit. For instance, the standardized performance benchmarks and ongoing I/O benchmarks (SCORPIO + ADIOS). So, we would need to build and test. As I said earlier, there is no real conflicting requirements between ACME-ECP and E3SM in the config_machines, any needed changes can be captured in config_compilers.

whannah1 commented 4 years ago

@sarat, maybe the issue is just that changes are being made in both repos. I agree there's a lot of overlap, but merging with E3SM has broken our configuration in the past.

sarats commented 4 years ago

Then, we should coordinate better. I don't see a fundamental reason for divergence.