SCOREC / pcms

BSD 3-Clause "New" or "Revised" License
2 stars 13 forks source link

We have very similar result to theirs now. #78

Closed phyboyzhang closed 4 years ago

phyboyzhang commented 4 years ago

I got the following result this morning

received density in   -65461417944270568.        1.8024656947582262E+017

which is produced by the following line in XGC

if (sml_intpl_mype.eq.0) print *, 'received density in ', minval(arrtmp),maxval(Arrtmp) .

Their result is

received density in   -66766084240252248.        1.8044732553688858E+017.
phyboyzhang commented 4 years ago

Before submitting the new PR here and here, there still exits a lot of problems in Adios2 1d sending and receiving routines exiting in the file cpl.cc, adiosRoutines here of COUPLER and initial_value_comp.F90 in GENEhere. One problem is that these routines were not fed the exactly indices of the arrays for each process. The other problem is that Transposing the processes of the subcommunicators between fortran and C++ is not accounted for.

cwsmith commented 4 years ago

I'm guessing we have workarounds (i.e., hacks) to avoid the problems if we are getting similar results. Is that understanding correct?

Specifically, the process transpose issue was mentioned in an old issue (https://github.com/SCOREC/wdmapp_coupling/issues/63) and, IIRC, a hack/workaround was added in gene (?) to avoid it.

phyboyzhang commented 4 years ago

Yes, I think the latest coupler here is very close to having the same function with theirs.

We need that transpose mentioned in (#63) for all the subcommunicators used for sending datas. But the sending routines only hander a single subcommunictor, so that we only use comm_x in the GENEhere. We actually could extend the transpose function to multiple subcommunicators.

phyboyzhang commented 4 years ago

By the new PR here, XGC can receives and send the correct data now. The following is
XGC's output by our COUPLER

main loop started
 - do loop start
step,trigger,ratio,# of ion      1     1.1000     1.0000       12800000
step,f0(ratio,max)    1    -1.0000     0.0000
 Engine for density read created, expecting      222539 node values
 received density in   -66766085106813520.        1.8044732484278979E+017
Linear solve converged due to CONVERGED_RTOL iterations 591
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
 begin declare IO
 IO is declared
 Field engine created, sending as dpott      222884  node values
  sending field data in   -3.0041616336642072        8.2681956600286171

Their result is

 main loop started
 - do loop start
step,trigger,ratio,# of ion      1     1.1000     1.0000       12800000
step,f0(ratio,max)    1    -1.0000     0.0000
 Engine for density read created, expecting      222539 node values
 received density in   -66766084240252248.        1.8044732553688858E+017
Linear solve converged due to CONVERGED_RTOL iterations 551
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
 Field engine created, sending as dpott      222884  node values
  sending field data in   -3.0041615687227501        8.2681957143420526
cwsmith commented 4 years ago

Sounds good.

I see the 'similar' branch but don't see a new Pull Request associated with it. Does it make sense to add these changes into the existing Pull Request: https://github.com/SCOREC/wdmapp_coupling/pull/76

phyboyzhang commented 4 years ago

After I recover some commented lines, I will add those changes to #76.

phyboyzhang commented 4 years ago

The updates were pushed to #76.

phyboyzhang commented 4 years ago

With the new push of coupler here and the new one of GENE here, Coupler can receive, send and process datas correctly until processing the density received from GENE at the second time.

The output of XGC after receiving the density processed by COUPLER is

 - do loop start
step,trigger,ratio,# of ion      1     1.1000     1.0000       12800000
step,f0(ratio,max)    1    -1.0000     0.0000
 Engine for density read created, expecting      222539 node values
 received density in   -66766084240252248.        1.8044732553688858E+017
Linear solve converged due to CONVERGED_RTOL iterations 551
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
 begin declare IO
 IO is declared
 Field engine created, sending as dpott      222884  node values
  sending field data in   -3.0041615687227510        8.2681957143420508

while the result without COUPLER is

 - do loop start
step,trigger,ratio,# of ion      1     1.1000     1.0000       12800000
step,f0(ratio,max)    1    -1.0000     0.0000
 Engine for density read created, expecting      222539 node values
 received density in   -66766084240252248.        1.8044732553688858E+017
Linear solve converged due to CONVERGED_RTOL iterations 551
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
 Field engine created, sending as dpott      222884  node values
  sending field data in   -3.0041615687227501        8.2681957143420526

The output of GENE after receiving the potential processed by the COUPLER is

 numiter,mype,minele,maxele,complexfield=           0           0         (0.0000000000000000,1.06099809785192469E-313) (-3.47651836030853889E-005,-5.52049919032803187E-005)             (9.0281206699294572,-0.16576231954342879)
 numiter,mype,minele,maxele,complexfield=           0           3   (6.95329252360673759E-310,2.78330521916008886E-315)  (8.28299705015635767E-019,-4.93390717208629602E-019)             (25.376563303636861,-0.29895541134016035)
 numiter,mype,minele,maxele,complexfield=           0           1   (6.95326463226191433E-310,4.32886745914677671E-315)  (-7.76971086018303272E-015,4.92261710835325763E-015)              (19.179166045049048,0.24544205774823549)
 numiter,mype,minele,maxele,complexfield=           0           2   (6.95326746543195384E-310,3.33742276561701389E-315) (-1.51548349491493300E-007,-1.53779389934384357E-007)              (11.932318926833766,0.13474000217242713)

while the result without COUPLER is

 numiter,mype,minele,maxele,sumfield=           0           0   (4.37642455815488508E-315,1.35479143892562404E-315) (-3.47651836030899358E-005,-5.52049919032811183E-005)             (9.0281206699294589,-0.16576231954342857)
 numiter,mype,minele,maxele,sumfield=           0           3   (4.51898162720189168E-315,1.35479143892562404E-315)  (8.28295250834425772E-019,-4.93388065869879031E-019)             (25.376563303636868,-0.29895541134015935)
 numiter,mype,minele,maxele,sumfield=           0           2   (4.92287539154590398E-315,1.35479143892562404E-315) (-1.51548349491495338E-007,-1.53779389934383642E-007)              (11.932318926833803,0.13474000217242613)
 numiter,mype,minele,maxele,sumfield=           0           1   (6.49506508212632165E-315,1.35479143892562404E-315)  (-7.76971086548956593E-015,4.92261710763183960E-015)              (19.179166045049023,0.24544205774823524)
phyboyzhang commented 4 years ago

Now with the new push here, COUPLER executes without error on AIMOS now. The result outputted by XGC with coupler is

 - do loop start
step,trigger,ratio,# of ion      1     1.1000     1.0000       12800000
step,f0(ratio,max)    1    -1.0000     0.0000
 Engine for density read created, expecting      222539 node values
 received density in   -66766084240252248.        1.8044732553688858E+017
Linear solve converged due to CONVERGED_RTOL iterations 551
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
 begin declare IO
 IO is declared
 Field engine created, sending as dpott      222884  node values
  sending field data in   -3.0041615687227510        8.2681957143420508
 received density in   -66735793089133368.        1.8041662780828058E+017
Linear solve converged due to CONVERGED_RTOL iterations 539
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
  sending field data in   -3.0029047356643832        8.2663026671386515
 received density in   -66733488630210624.        1.8040861552557069E+017
Linear solve converged due to CONVERGED_RTOL iterations 538
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
  sending field data in   -3.0028010321840704        8.2659397355852349
 received density in   -66700901892786808.        1.8037007061227632E+017
Linear solve converged due to CONVERGED_RTOL iterations 538
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
  sending field data in   -3.0014408597626585        8.2641286630294104
step,trigger,ratio,# of ion      2     1.1000     1.0075       12800000
step,f0(ratio,max)    2    -1.0000     0.0000

the result outputted by XGC without coupler is

 - do loop start
step,trigger,ratio,# of ion      1     1.1000     1.0000       12800000
step,f0(ratio,max)    1    -1.0000     0.0000
 Engine for density read created, expecting      222539 node values
 received density in   -66766084240252248.        1.8044732553688858E+017
Linear solve converged due to CONVERGED_RTOL iterations 551
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
 Field engine created, sending as dpott      222884  node values
  sending field data in   -3.0041615687227501        8.2681957143420526
 received density in   -66735793089133368.        1.8041662780828061E+017
Linear solve converged due to CONVERGED_RTOL iterations 539
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
  sending field data in   -3.0029047356643694        8.2663026671386710
 received density in   -66733488630210640.        1.8040861552557075E+017
Linear solve converged due to CONVERGED_RTOL iterations 538
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
  sending field data in   -3.0028010321840770        8.2659397355852278

COUPLER leaves MPI as follows

0 done loop 0 3
1 done loop 0 3
3 done loop 0 3
2 done loop 0 3
6666
6666
0 before kokkos finalize
0 done kokkos finalize
1 before kokkos finalize
1 done kokkos finalize
3 before kokkos finalize
3 done kokkos finalize
2 before kokkos finalize
2 done kokkos finalize
MPI is finalized.
MPI is finalized.
MPI is finalized.
MPI is finalized.
phyboyzhang commented 4 years ago

The following is a 12-step running result by COUPLER

 - do loop start
step,trigger,ratio,# of ion      1     1.1000     1.0000       12800000
step,f0(ratio,max)    1    -1.0000     0.0000
 Engine for density read created, expecting      222539 node values
 received density in   -66766084240252248.        1.8044732553688858E+017
Linear solve converged due to CONVERGED_RTOL iterations 551
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
 begin declare IO
 IO is declared
 Field engine created, sending as dpott      222884  node values
  sending field data in   -3.0041615687227510        8.2681957143420508
 received density in   -66735793089133368.        1.8041662780828058E+017
Linear solve converged due to CONVERGED_RTOL iterations 539
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
  sending field data in   -3.0029047356643637        8.2663026671386710
 received density in   -66733488630210624.        1.8040861552557069E+017
Linear solve converged due to CONVERGED_RTOL iterations 538
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
  sending field data in   -3.0028010321840801        8.2659397355852260
 received density in   -66700901892786808.        1.8037007061227632E+017
Linear solve converged due to CONVERGED_RTOL iterations 538
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
  sending field data in   -3.0014408597626581        8.2641286630294122
step,trigger,ratio,# of ion      2     1.1000     1.0075       12800000
step,f0(ratio,max)    2    -1.0000     0.0000
 received density in   -66700899401757880.        1.8037001358596678E+017
Linear solve converged due to CONVERGED_RTOL iterations 539
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
  sending field data in   -3.0014407588718175        8.2641263165010042
 received density in   -66666009126109072.        1.8032345261732947E+017
Linear solve converged due to CONVERGED_RTOL iterations 538
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
  sending field data in   -2.9999769234631404        8.2619521786054815
 received density in   -66663714914872920.        1.8031559986897914E+017
Linear solve converged due to CONVERGED_RTOL iterations 538
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
  sending field data in   -2.9998736445836567        8.2615958766467319
 received density in   -66626542026835192.        1.8026134046325834E+017
Linear solve converged due to CONVERGED_RTOL iterations 538
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
  #########*sending field data in   -2.9983070148487716        8.2590718818981710*
step,trigger,ratio,# of ion      3     1.1000     1.0108       12800000
step,f0(ratio,max)    3    -1.0000     0.0000
 received density in   -66626538602701608.        1.8026128742366922E+017
Linear solve converged due to CONVERGED_RTOL iterations 538
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
  sending field data in   -2.9983068729228903        8.2590696774874832
 received density in   -66587072375884064.        1.8019917224924774E+017
Linear solve converged due to CONVERGED_RTOL iterations 539
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
  sending field data in   -2.9966370030285336        8.2561892758124049
 received density in   -66584791066393432.        1.8019146936502534E+017
Linear solve converged due to CONVERGED_RTOL iterations 538
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
  sending field data in   -2.9965342665264956        8.2558392648043721
 received density in   -66551873618617368.        1.8012179700038339E+017
Linear solve converged due to CONVERGED_RTOL iterations 537
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
  sending field data in   -2.9947622599178536        8.2526149988902038
step,trigger,ratio,# of ion      4     1.1000     1.0115       12800000
step,f0(ratio,max)    4    -1.0000     0.0000

The output of XGC without COUPLER up to 8 time steps is

 - do loop start
step,trigger,ratio,# of ion      1     1.1000     1.0000       12800000
step,f0(ratio,max)    1    -1.0000     0.0000
 Engine for density read created, expecting      222539 node values
 received density in   -66766084240252248.        1.8044732553688858E+017
Linear solve converged due to CONVERGED_RTOL iterations 551
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
 Field engine created, sending as dpott      222884  node values
  sending field data in   -3.0041615687227501        8.2681957143420526
 received density in   -66735793089133368.        1.8041662780828061E+017
Linear solve converged due to CONVERGED_RTOL iterations 539
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
  sending field data in   -3.0029047356643694        8.2663026671386710
 received density in   -66733488630210640.        1.8040861552557075E+017
Linear solve converged due to CONVERGED_RTOL iterations 538
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
  sending field data in   -3.0028010321840770        8.2659397355852278
 received density in   -66700901892786792.        1.8037007061227632E+017
Linear solve converged due to CONVERGED_RTOL iterations 538
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
  sending field data in   -3.0014408597626554        8.2641286630294122
step,trigger,ratio,# of ion      2     1.1000     1.0075       12800000
step,f0(ratio,max)    2    -1.0000     0.0000
 received density in   -66700899401757864.        1.8037001358596688E+017
Linear solve converged due to CONVERGED_RTOL iterations 539
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
  sending field data in   -3.0014407588718246        8.2641263165009953
 received density in   -66666009126109056.        1.8032345261732941E+017
Linear solve converged due to CONVERGED_RTOL iterations 539
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
  sending field data in   -2.9999769234624036        8.2619521786062151
 received density in   -66663714914872952.        1.8031559986897920E+017
Linear solve converged due to CONVERGED_RTOL iterations 538
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
  sending field data in   -2.9998736445836593        8.2615958766467301
 received density in   -66626542026835168.        1.8026134046325827E+017
Linear solve converged due to CONVERGED_RTOL iterations 538
Linear s2_ solve converged due to CONVERGED_RTOL iterations 5
#######*sending field data in   -2.9983070148486632        8.2590718818982793*

The two results are consistent to the last line of the latter result beginning with "#######", which means that we can completely repeat their results for the preprocess phase.