tomdg1626003 commented 11 months ago

Hi developer, I compared the running speed of EPOCH and VPIC,then found that EPOCH is much slower. Mainly due to the following reasons, firstly, VPIC is vector storage. Secondly、 EPOCH needs to convert data into the International System of Units, so the more particles, the slower it will be. 3Thirdly、 EPOCH is structured data, and data calculated on different cores need to be transmitted and merged at the same time, which requires time for data transmission. The simulation model is Maxwellian, the simulation domain is NxNy = 512512, Total_time=10wpe^{-1} . ppc Cores Time
EPOCH 500 512 206 s EPOCH 500 1024 163 s EPOCH 500 2048 171 s VPIC 500 512 36 s VPIC 500 512 20 s VPIC 500 512 14 s

Is there a better way to improve the speed of parallel computing for EPOCH?

DanRRRR commented 11 months ago

tomdg1626003, Please post the test files and compiler switches for all others to understand what and how specifically codes were tested. Were AVX512 used? There are also confusing typos in your post, some of them probably introduced by the buggy GitHub :)

tomdg1626003 commented 11 months ago

Thanks for your reply. The compiler is Intel ifort, I submit tasks by using sbatch, but how to ues the AVX512? The following is input.deck begin:constant

total_number_density = 1e14 nb_n0 = 0
wpe_wce = 10 vthe = 0.05c v_e = 0.2c pv_e = me*v_e mi_me = 1836.0 Ti_Te = 1

------------------------------

wpe = sqrt(total_number_densityqe^2/epsilon0/me) wce = wpe/wpe_wce B0 = wceme/qe
Te = me*vthe^2/kb lamdade = vthe/wpe

delta_x = 0.39lamdade nxgrid = 512 nygrid = 512 dt = delta_x/c0.8 Nppc = 500

------------------------------

end:constant

begin:control nx = nxgrid # in x ny = nygrid # in y

t_end = 10/wpe

x_min = 0.0 x_max = nx delta_x y_min = -0.5nydelta_x y_max = 0.5ny*delta_x

end:control

begin:fields ex = 0 ey = 0 ez = 0 bx = B0 by = 0 bz = 0
end:fields

begin:boundaries bc_x_min = periodic bc_x_max = periodic bc_y_min = periodic bc_y_max = periodic end:boundaries

begin:species

name = proton charge = 1.0 mass = mi_me number_density = total_number_density temperature = Ti_Te*Te npart_per_cell = Nppc

end:species

begin:species

name = electron charge = -1.0 mass = 1.0 temperature = Te number_density = total_number_density npart_per_cell = Nppc
end:species

begin:output name = field dt_snapshot = 1*dt fileprefix = field

ex = always ey = always ez = always bx = always by = always bz = always end:output

DanRRRR commented 11 months ago

I never tried it with EPOCH, and i am not a developer, but with other codes this worked: -O2 -xcore-avx512

Noticed your text looks abnormal? GitHub can not fix that for years. To fix how your input.deck look please re-post it again by preceding text with three grave accents in the line before the text, not one accent like it is doing automatically. Then the text will look not distorted

Status-Mirror commented 11 months ago

Hey @tomdg1626003,

I notice you're outputting the full fields every time-step. For a domain as large as yours, this will be a major bottleneck in your simulation. To address your specific concerns:

All EPOCH equations are in SI units - EPOCH doesn't convert to non-SI units in the code, and then switch back. This was a design choice to make the code more readable for new developers (the "E" in EPOCH stands for "Extendable").
Yes, we use domain decomposition for EPOCH parallelism. This allows the code to simulate domains which are so large that the memory requirement wouldn't fit on a single computer. To have all the data stored on a single MPI-rank would not be desirable.

Despite these points, we are currently transitioning the code to C++ to make better use of the performance-enhancing tools offered there. We expect this work to finish towards the end of 2025, but it is difficult to forecast development over such long timescales.

Cheers, Stuart

DanRRRR commented 11 months ago

Oh, no....

I switched from some pretty well known C/C++ PIC code (i will not tell which one) to EPOCH code and got the speed an order or sometimes orders of magnitude faster so that often do not use supercomputers anymore and do the run on just a PC or small workstation. And i know there exist further possibilities to improve EPOCH speed by an order more while with mentioned C/C++ code pretty experienced people tried everything and failed.

The transitioning code to C++ is absolutely wrong unjustified step. The vector AVX instructions work for C and Fortran codes. Graphics drivers for GPU accelerators exist for C and Fortran codes. Where will you gain anything by bloating the source code several times and making it much harder to extend by both developers and others??? You are just wasting your time.

Even more, I predict this will be killing EPOCH unless there will be a bunch of high paid devoted developers following and maintaining it. Cost of maintenance of C codes is an order of magnitude larger than Fortran codes. Essentially the activity from users from large national labs maintain all major existing C/C++ PIC codes. Do EPOCH has one in mind? If not it will lose the most of current users.

The simplicity and speed are the key for further use of EPOCH code by numerous small research groups. If it will be C code, then users will find a lot of other currently more advanced alternatives. Do you need me to name a few of them?

EPOCH better concentrate on adding new features and physical models like boosted frame and spectral methods, get even faster speed where Fortran is famous for by optimization, vector instructions and GPU. Adding other options besides SDF for output as the only choice would help too. EPOCH already late to add some features, what, do you think with C/C++ the lag will be magically smaller? :) Even if you think to hire very high profile, really top notch C/C++ programmer to speedup the development, it will be the matter of just few years or even months when he will get the offer for better paid job and after him you will be left with not easily maintainable code. Same disaster eventually will be if large and complex scientific C code will be developed and maintained by the supervised students.

Fortran is not a dying language at all. In supercomputer world C and Fortran are like a king and queen. Look at TIOBE list for example, it will surprise and refresh many. It beats even MATLAB used by every dog and cat on the planet. And for the science and engineering it is probably simply the best. https://www.tiobe.com/tiobe-index/

Status-Mirror commented 11 months ago

I understand the concern with the switch - I too have used an incomprehensible C/C++ PIC code (I will also not name which one). I think many of the issues with readability come down to developer documentation - with adequete documentation, the code should still be extendible. Developer documentation will be a high priority for me personally.

I'd also argue the FORTRAN code is already bloated - many different groups have added to the source-code, which is great, but these extensions can be inefficient and clumbersome. The core PIC algorithms are now hidden in massive fields, boundaries and particles files, and repeated 3 times for each dimension. Maintaining the code is starting to become a challenge, and EPOCH is definitely due a re-write.

So why C++ then? Our funded grant application identified 6 reasons for C++ dominance over FORTRAN:

Object Orientation principles will allow EPOCH developers much better control over code structure and modularisation. Yes FORTRAN can do this too, but it is a principle design feature of C++ and is much more supported there.
C++ templating would allow 1D, 2D, 3D and cylindrical codes to be available from the same source, simplifying maintainace
Modern C++ tools typically come first, particularly with parallel libraries (for example, CUDA predates CUDA-Fortran by 2 years). Next generation supercomputers may require models that aren't Fortran-supported for a number of years.
Modern compilers often adopt the latest C++ standards before they follow the Fortran standards, especially the clang compiler (which has now also been adopted by Intel). We may want to use the latest language features on super-computers.
There is heavy investment from the US Department of Energy, Intel, and others in C++ tool chains that is not currently being matched with Fortran. The US Department of Energy Exascale Computing Project (ECP) is focused on the C++ Kokkos and RAJA portability layers. Intel’s OneAPI consists of a number of language-agnostic libraries but is coupled with Data Parallel C++ (DPC++, an extension of SYCL) as its target language.
Pragma based approaches such as OpenACC and OpenMP-4.5 may allow targeting of accelerators from Fortran but there are many questions of the performance portability of such approaches. Approaches such as Kokkos, RAJA and DPC++ allow greater developer control over the parallelism that is exposed, potentially leading to higher performance.

P.S. Apologies @tomdg1626003, you probably weren't asking for this clash of codes when you posted your original issue.

DanRRRR commented 11 months ago

I agree that three 1D/2D/3D versions for EPOCH is a bit cumbersome. Not much though but still. I am sure future CharGPT will edit all three versions simultaneously as you edit just one of them :)

As a research project C/C++ one could be not bad idea. But that project may take years. And at the end it might become 3x larger than all three Fortran versions combined :))). Hopefully you will continue support Fortran version meantime and after !

@tomdg1626003, Any success with AVX512?

DanRRRR commented 10 months ago

tomdg1626003, Can you post also the input file for vpic? Were GPU or AVX512 used there ? Looks like it is heavily vector code

tomdg1626003 commented 9 months ago

Sorry to reply so late to you @DanRRRR . The following is inputvpic inputvpic.txt

Moreover, I am not very familiar with the AVX512 you mentioned, can it accelerate EPOCH?

Warwick-Plasma / epoch

How to improve parallel computing speed for EPOCH #589

------------------------------

------------------------------