How to achieve the simulation with 10^10 particles?

zmm0704 commented 3 years ago

Dear authors: It is an interesting work for you to develop a GPU Geant4-based Monte Carlo simulation (GGEM). And recently, we have researched on the physical simulation via GATE software. However, it need impractical long computation time. Therefore, we hope to obtain fast and effective simulation process by using GGEM. First, we try ct_scanner example by setting MAXIMUM_PARTICLES to 10^8 on our server (tesla A100 40G), and the peak used GPU memory is 7.3GB. Theoretically, we can implement the ct_scanner example under MAXIMUM_PARTICLES at 5*10^8, but we failed. After studying the source code, we found that CL_DEVICE_MAX_MEM_ALLOC_SIZE reported by OpenCL is only 1/4 total GPU memory and it seems 10 gigabytes in our case. In addition, the unattenuated pixel value is almost 200 that is not satisfied our requirement when the MAXIMUM_PARTICLES is set as 10^8, and the MAXIMUM_PARTICLES at 10^10 is best. Therefore, we want to know how to achieve this goal by using GGEM? Thanks a lot for your help.

PS: sevrer has installed with 6 * tesla A100 40G and opencl 1.2.

didierbenoit commented 3 years ago

Hi, MAXIMUM_PARTICLES is the number of maximum particles simulated in same time by an OpenCL device and not the number of simulated particles. By default we set MAXIMUM_PARTICLES at 2^20 for using old graphic cards. If you want to simulate 10^10 particles or more, in ct_scanner example you have to use the '--nparticles' option (for python) or '--n-particles ' option for (C++)

zmm0704 commented 3 years ago

Dear authors: Thanks a lot for your reply with patience and it works. Moreover, there are two small questions when we run the GGEMS on another server. First, we want to know whether the generated projection includes the scattered signal since the result consist of one projection.raw and one projection-scatter.raw. Second, when we compiled GGEMS on GeForce RTX 3090 (24268 MB /per GPU) with MAXIMUM_PARTICLES at original 1048576, and running ct_scanner.py used almost 300MB GPU memory / per GPU 4 GPUs. Theoretically, MAXIMUM_PARTICLES can be 1048576 20. However, the ct_scanner.py cannot successfully run when MAXIMUM_PARTICLES was set as 1048576 * 5. The command is “./ct_scanner --device 0 --n-particles 10000000000” and the GPU (order 0) is not used by others. In addition, the log file is attached. Thus, any suggestions about those problems? log.txt

didierbenoit commented 3 years ago

Hi, Yes, the generated projection included scattered signal. The projection-scatter.raw is an extra output if you want scatter information. We recommend to not change MAXIMUM_PARTICLES parameter. It depends on your private memory on your graphic card and not your global memory. So don't change this value, or maybe use 10485764 (it tested this value on 1050 Ti). So it's not a problem, it's just you don't have enough private memory. And there is no difference on performance if you use 10485764, or 104857620 or 10485765. Best

zmm0704 commented 3 years ago

Dear authors: I am so appreciate it for you kindness and help, and the software GGEMS is an impressive work. Best wishes!

GGEMS / ggems

How to achieve the simulation with 10^10 particles? #1