Open WangYun1995 opened 1 year ago
Thank you for your nice reply, i will try what you said.
Hi Yu,
The method you suggest does work. However, Parallelization is not going to be very efficient. For example, I use a mesh with Nmesh = 768**3 as a test. When using only one core, the time taken is 1407 seconds. When using 64 cores, the time taken is 722 seconds. When using 96 cores, the time taken is 611 seconds.
Obviously, turning on the parallelization only saves half the time, not 1407/64 or 1407/96 seconds.
Why is this the case?
Do you know which part of your program is taking most time?
Usually if your application is already bound by IO then the speed will depend on how much the file system can provide. Adding some prints of the time can help us find out.
Less critical is making sure cpu cores are not oversubscribed, by making sure no adequate MKL or OpenMP threads per task.
On Tue, Sep 19, 2023 at 11:48 PM WangYun1995 @.***> wrote:
Hi Yu, The method you suggest does work. However, Parallelization is not going to be very efficient. For example, I use a mesh with Nmesh = 768**3 as a test. When using only one core, the time taken is 1407 seconds. When using 64 cores, the time taken is 722 seconds. When using 96 cores, the time taken is 611 seconds. Obviously, turning on the parallelization only saves half the time, not 1407/64 or 1407/96 seconds. Why is this the case?
— Reply to this email directly, view it on GitHub https://github.com/bccp/nbodykit/issues/683#issuecomment-1727070743, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABBWTCH26BDOYSNHN65SMDX3KGT7ANCNFSM6AAAAAA4ZKYTSQ . You are receiving this because you commented.Message ID: @.***>
The operations inside the for loop consume the most time.
Perhaps it is because the computation and disk IO that leads to dens_m is repeated for each iteration?
I recall there is a way to create a Mesh from dens_m (FieldMesh? ArrayMesh)? Cannot recall the name right away). perhaps replace the real density Mesh object used in the loop with that?
On Thu, Sep 21, 2023 at 10:06 PM WangYun1995 @.***> wrote:
The operations inside the for loop consume the most time.
— Reply to this email directly, view it on GitHub https://github.com/bccp/nbodykit/issues/683#issuecomment-1730807085, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABBWTE4L6DOLIE5WJDUPA3X3UME7ANCNFSM6AAAAAA4ZKYTSQ . You are receiving this because you commented.Message ID: @.***>
This really improves the efficiency. I will acknowledge you in my new paper.
Glad to hear it worked. Thank you!
On Sat, Sep 23, 2023 at 12:53 AM WangYun1995 @.***> wrote:
This really improves the efficiency. I will acknowledge you in my new paper.
— Reply to this email directly, view it on GitHub https://github.com/bccp/nbodykit/issues/683#issuecomment-1732246357, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABBWTFBS2ESZ4IBIFSKXR3X32IPRANCNFSM6AAAAAA4ZKYTSQ . You are receiving this because you commented.Message ID: @.***>
Hi, First, I have a 3D density mesh (Nmesh=1536^3) obtained by using nbodykit, stored as a bigfile on my disk. Then I want to convolve the mesh with a band-pass filter at multiple scales, which is equivalent to the multiplication operation in Fourier domain. Finally, at each scale, I compute some binned statistic
env_WPS
of the convolved mesh, and my code snippet is shown below.Since the above code uses only one core, it is inefficient in the case of a large Nmesh. So how to parallelize the code in the context of NBODYKIT?