Open Minidoracat opened 3 months ago
It is worth considering GPU acceleration in most cases. However, in this project, it is not necessary. The map generation process is referred to as 'render'. Since the map is essentially a 2D image and the generation process involves copying and pasting textures onto specific positions, there are no complex operations like 3D projection or vertex computations involved. Therefore, the bottleneck does not lie in the rendering part. I have tried implementing some experimental OpenCL code for texture blending on the GPU, but it doesn't significantly reduce the time taken as the process of copying textures and terabytes of resulting images in and out of the GPU memory also increases the time.
When you set profile: true
in the configuration file and use only 1 thread, you can obtain profiling information to identify the time cost of each function call. The majority of the time is spent on resize
and encode
, which involves creating thumbnails for smaller images and compressing images into jpg or png format. The next significant time-consuming task is io
, which is used for writing images to disk. The last notable task is alpha composite
, which encompasses the actual rendering operations, taking only 15 seconds out of 181 seconds.
Top part of a profiling result:
46309536 function calls (46306953 primitive calls) in 181.467 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
636 45.755 0.072 45.755 0.072 {method 'resize' of 'ImagingCore' objects}
1921 37.921 0.020 37.921 0.020 {method 'encode' of 'ImagingEncoder' objects}
7742 20.328 0.003 20.329 0.003 {built-in method _io.open}
465567 15.211 0.000 15.211 0.000 {built-in method PIL._imaging.alpha_composite}
8360 12.211 0.001 12.211 0.001 {method 'decode' of 'ImagingDecoder' objects}
3525 9.374 0.003 9.374 0.003 {method 'convert' of 'ImagingCore' objects}
3646296 4.980 0.000 47.681 0.000 base.py:28(square)
11016 3.795 0.000 52.113 0.005 pzdzi.py:366(render_tile)
1127 3.702 0.003 3.702 0.003 {method 'encode_to_file' of 'ImagingEncoder' objects}
465649 2.396 0.000 2.396 0.000 {method 'crop' of 'ImagingCore' objects}
6895 2.378 0.000 2.378 0.000 {built-in method PIL._imaging.fill}
470224 2.040 0.000 2.040 0.000 {method 'paste' of 'ImagingCore' objects}
6259 1.511 0.000 1.511 0.000 {method 'getbbox' of 'ImagingCore' objects}
470224 1.458 0.000 12.492 0.000 Image.py:1661(paste)
14586310 1.383 0.000 1.383 0.000 {built-in method builtins.divmod}
946700 1.225 0.000 1.687 0.000 Image.py:514(_new)
465649 1.052 0.000 4.064 0.000 Image.py:1222(_crop)
2357200 0.965 0.000 1.602 0.000 Image.py:820(load)
...
Considering this, you might wonder if the resize
or encode
process can be accelerated by using GPU.
When it comes to the "resize" function, there are researchers who are working on creating faster video thumbnails. For example, this one. These solutions typically involve using the GPU for video decoding. However, in our case, the initial image is raw data obtained directly from rendering and doesn't require decoding. If we were to use the GPU, we would need to copy the data in, perform the resizing operation, and then copy the data out. Given that the resizing operation is a simple resampling, the time consumed by the copying process would likely exceed the time saved by using the GPU for the resizing.
For 'encode', the process typically involves standard data compression such as Deflate or LZ-77. Compression algorithms heavily rely on context, making parallelization difficult even on the CPU. Therefore, don't expect the GPU to assist with this.
Hello,
First of all, thank you for your excellent work on this project. It has been very useful.
I would like to suggest adding GPU support to improve the map rendering speed. Utilizing GPU resources could significantly enhance performance, especially for large-scale map rendering tasks.
Thank you for considering this feature request.
Best regards, Minidoracat