Need help on doing a benchmark run using WarpX

ECP-WarpX / WarpX

WarpX is an advanced electromagnetic & electrostatic Particle-In-Cell code.

https://ecp-warpx.github.io

Other

298 stars 191 forks source link

Need help on doing a benchmark run using WarpX #4368

Closed Yin-YinjianZhao closed 7 months ago

Yin-YinjianZhao commented 1 year ago

Hi guys, as suggested by @RemiLehe Remi, I am opening an issue here and asking for help on doing a benchmark test using WarpX, regarding ExB instabilities occurred in Hall thrusters. The benchmark is described in this paper: 10.1088/1361-6595/ab46c5, I attached the PDF as well. 2D axial-azimuthal particle-in-cell benchmark for low-temperature partially magnetized plasmas.pdf

One special thing I need to do is that after solving the Poisson's equation, the potential needs to be modified according to

phi = phi - (x/xe)*U_average,

where U_average is the y average at x=xe, this modification corresponds to Eq. (9) in the paper.

I therefore added a function in PICMI script,

def phi_fix():
    phi = fields.PhiFPWrapper(0, True)[...]
    phi_ave = np.sum(phi[i_xin][1:nz+1]) / nz
    for i in range(len(phi)-2):
        x = i*dx
        phi[i+1][:] = phi[i+1][:] - x/xin*phi_ave
        # or phi[i+1][1:nz+1] = phi[i+1][1:nz+1] - x/xin*phi_ave
    fields.PhiFPWrapper(0, True)[...] = phi

callbacks.installafterEsolve(phi_fix)

Remi mentioned it should be installed afterEsolve, but I also need to update the E field accordingly in the Python function. Axel @ax3l mentioned that maybe we can add a new callback position between poisson solve and stencil for E.

Therefore, based on what I have in the PICMI scripts, can we add a new callback position and solve this problem?

This is the full PICMI script. picmi.txt

Yin-YinjianZhao commented 12 months ago

Currently, I'm trying to use this piece of code to do the job, but I'm a little bit confused about the indexing of phi and Ex. The simulation has nx=500, phi has length along x 503, I would say it contains guard grid points, nx=500 is 500 cells, which has 0,1,2,...,500, thus 501 grid points. Plus two guard grid points, equals 503. But Ex seems to have more than 503 length, so I'm confused.

def phi_fix():
    phi = fields.PhiFPWrapper(0, True)[...]
    Ex = fields.ExFPWrapper(0, True)[...]
    phi_ave = np.mean(phi[i_xin][:])
    for i in range(len(phi)-2):
        x = i*dx
        phi[i+1][:] = phi[i+1][:] - x/xin*phi_ave
    for i in range(len(phi)-2):
        Ex[i+1][2:nz+2] = -(phi[i+2][1:nz+1]-phi[i][1:nz+1])/(dx*2)
    fields.PhiFPWrapper(0, True)[...] = phi 
    fields.ExFPWrapper(0, True)[...] = Ex

callbacks.installafterEsolve(phi_fix)

Yin-YinjianZhao commented 12 months ago

Well, because I don't quite know if the above is doing correctly, (noticed I was using staggered grid, but switched to collocated grid now), and the simulation speed drops a lot, so I hard coded the following for now in ElectrostaticSolver.cpp.

#elif defined(WARPX_DIM_XZ) || defined(WARPX_DIM_RZ)

                int rank,nprocs,n,nc,ns;
                MPI_Comm_size(MPI_COMM_WORLD,&nprocs);
                MPI_Comm_rank(MPI_COMM_WORLD,&rank);
                n = 256/nprocs;
                nc = 0;
                Real phi0 = 0._rt, phis = 0._rt;
                for (int i=rank*n; i<=(rank+1)*n; i++) {
                    phi0 = phi0 + phi_arr(480,i,0);
                    nc = nc + 1;
                };
                MPI_Allreduce(&phi0, &phis, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);
                MPI_Allreduce(&nc, &ns, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
                phis = phis / double(ns);

                amrex::ParallelFor( tbx, tby, tbz,
                    [=] AMREX_GPU_DEVICE (int i, int j, int k) {
                        Real f = i*dx[0]/2.4e-2*phis*2.0;
                        Ex_arr(i,j,k) +=
                            +(beta_x*beta_x-1._rt)*0.5_rt*inv_dx*(phi_arr(i+1,j  ,k)-phi_arr(i-1,j  ,k)-f)
                            + beta_x*beta_z       *0.5_rt*inv_dz*(phi_arr(i  ,j+1,k)-phi_arr(i  ,j-1,k)-f);
                    },
                    [=] AMREX_GPU_DEVICE (int i, int j, int k) {
                        Real f = i*dx[0]/2.4e-2*phis*2.0;
                        Ey_arr(i,j,k) +=
                            +beta_x*beta_y*0.5_rt*inv_dx*(phi_arr(i+1,j  ,k)-phi_arr(i-1,j  ,k)-f)
                            +beta_y*beta_z*0.5_rt*inv_dz*(phi_arr(i  ,j+1,k)-phi_arr(i  ,j-1,k)-f);
                    },
                    [=] AMREX_GPU_DEVICE (int i, int j, int k) {
                        Real f = i*dx[0]/2.4e-2*phis*2.0;
                        Ez_arr(i,j,k) +=
                            + beta_z*beta_x       *0.5_rt*inv_dx*(phi_arr(i+1,j  ,k)-phi_arr(i-1,j  ,k)-f)
                            +(beta_z*beta_z-1._rt)*0.5_rt*inv_dz*(phi_arr(i  ,j+1,k)-phi_arr(i  ,j-1,k)-f);
                    }
                );

RemiLehe commented 12 months ago

Sorry for the late reply! Here are a few answers to your questions:

Regarding the update of E: as you indicated in your second message, you can simply update E directly in afterEsolve.
Regarding the number of points of phi and E: the difference is because E typically has more guard cells than phi. This is because phi requires only one guard cell (for the parallel Poisson solver), while E needs to have enough guard cells for particle with a wide shape factor to gather values from the guard cells (if they are close to a boundary between sub-domain). You can try to pass include_ghosts=False in ExWrapper and PhiWrapper to avoid this problem.

As a side-note, using explicit for loops in Python code like this:

for i in range(len(phi)-2):
    x = i*dx
    phi[i+1][:] = phi[i+1][:] - x/xin*phi_ave
for i in range(len(phi)-2):
    Ex[i+1][2:nz+2] = -(phi[i+2][1:nz+1]-phi[i][1:nz+1])/(dx*2)

is very inefficient. It is much more efficient to use numpy array-wise operations, e.g.

x = dx*np.arange(len(phi)-2)
phi[1:-1, :] += phi[1:-1,:] - x/xin*phi_ave
Ex[1:-1,2:-2] += -( phi[2:,1:-1]-phi[:-2,1:-1] )/(2*dx)

Yin-YinjianZhao commented 11 months ago

@RemiLehe Hi Remi, thanks for your detailed reply!

I have a few more questions.

Within the above defined function phi_fix(), when I use domain decomposition, each MPI rank will only handle part of the whole phi array, right? But when I print(len(phi)) and print(len(phi[0][:])), I got all the same full length from all the ranks, why is that? In other words, does your suggested code still work correctly if I turn on domain decomposition by setting say numprocs=[1,2]?
My comment above yours, which hard coded that piece of code in ElectrostaticSolver.cpp, will that does the job I want correctly? And will it work correctly for whatever parallelization?
If I need to give correct values to the guard cells?

Yin-YinjianZhao commented 11 months ago

Well, I made a mistake I guess, this may be better.

#elif defined(WARPX_DIM_XZ) || defined(WARPX_DIM_RZ)

                int rank,nprocs,n,nc,ns;
                MPI_Comm_size(MPI_COMM_WORLD,&nprocs);
                MPI_Comm_rank(MPI_COMM_WORLD,&rank);
                n = 256/nprocs;
                nc = 0; 
                Real phi0 = 0._rt, phis = 0._rt;
                for (int i=rank*n; i<=(rank+1)*n; i++) {
                    phi0 = phi0 + phi_arr(480,i,0);
                    nc = nc + 1; 
                };
                MPI_Allreduce(&phi0, &phis, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);
                MPI_Allreduce(&nc, &ns, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
                phis = phis / double(ns);

                const Box& tb  = mfi.tilebox( phi[lev]->ixType().toIntVect() );
                amrex::ParallelFor( tb,
                    [=] AMREX_GPU_DEVICE (int i, int j, int k) { 
                        phi_arr(i,j,k) = phi_arr(i,j,k) - i*dx[0]/2.4e-2*phis;
                    } // loop ijk
                );

                amrex::ParallelFor( tbx, tby, tbz, 
                    [=] AMREX_GPU_DEVICE (int i, int j, int k) { 
                        Ex_arr(i,j,k) +=
                            +(beta_x*beta_x-1._rt)*0.5_rt*inv_dx*(phi_arr(i+1,j  ,k)-phi_arr(i-1,j  ,k)) 
                            + beta_x*beta_z       *0.5_rt*inv_dz*(phi_arr(i  ,j+1,k)-phi_arr(i  ,j-1,k));
                    },
                    [=] AMREX_GPU_DEVICE (int i, int j, int k) { 
                        Ey_arr(i,j,k) +=
                            +beta_x*beta_y*0.5_rt*inv_dx*(phi_arr(i+1,j  ,k)-phi_arr(i-1,j  ,k)) 
                            +beta_y*beta_z*0.5_rt*inv_dz*(phi_arr(i  ,j+1,k)-phi_arr(i  ,j-1,k));
                    },
                    [=] AMREX_GPU_DEVICE (int i, int j, int k) { 
                        Ez_arr(i,j,k) +=
                            + beta_z*beta_x       *0.5_rt*inv_dx*(phi_arr(i+1,j  ,k)-phi_arr(i-1,j  ,k)) 
                            +(beta_z*beta_z-1._rt)*0.5_rt*inv_dz*(phi_arr(i  ,j+1,k)-phi_arr(i  ,j-1,k));
                    }
                );

dpgrote commented 11 months ago

Hi Yin - For you question 1, when you do phi = fields.PhiFPWrapper(0, True)[...], the phi that is returned is a full sized array covering the full domain and is broadcast to all processors. This will work fine, but may not be efficient since every processor will be doing the same operation on the whole array.

If you want to do the operation locally on each processor, you would need to use the Python equivalent of a loop over MFIters. For example:

phi = fields.PhiFPWrapper()
Ex = fields.ExFPWrapper()
for mfi in phi:
    phiarr = phi.mf.array(mfi)
    Exarr = Ex.mf.array(mfi)
    phiarr += ...

Yin-YinjianZhao commented 11 months ago

Hi Yin - For you question 1, when you do phi = fields.PhiFPWrapper(0, True)[...], the phi that is returned is a full sized array covering the full domain and is broadcast to all processors. This will work fine, but may not be efficient since every processor will be doing the same operation on the whole array.

If you want to do the operation locally on each processor, you would need to use the Python equivalent of a loop over MFIters. For example:
phi = fields.PhiFPWrapper()
Ex = fields.ExFPWrapper()
for mfi in phi:
    phiarr = phi.mf.array(mfi)
    Exarr = Ex.mf.array(mfi)
    phiarr += ...

I see, thank you so much Dave, I will have a try!

Yin-YinjianZhao commented 7 months ago

I will close this issue, because I have had a working scipt already. Thanks for all the help!