cff29546 / pzmap2dzi

A command-line tool to convert Project Zomboid map data into Deep Zoom format
MIT License
51 stars 17 forks source link

[Feature Request] Improve Map Rendering Speed with GPU Support #15

Open Minidoracat opened 3 months ago

Minidoracat commented 3 months ago

Hello,

First of all, thank you for your excellent work on this project. It has been very useful.

I would like to suggest adding GPU support to improve the map rendering speed. Utilizing GPU resources could significantly enhance performance, especially for large-scale map rendering tasks.

Thank you for considering this feature request.

Best regards, Minidoracat

cff29546 commented 2 months ago

It is worth considering GPU acceleration in most cases. However, in this project, it is not necessary. The map generation process is referred to as 'render'. Since the map is essentially a 2D image and the generation process involves copying and pasting textures onto specific positions, there are no complex operations like 3D projection or vertex computations involved. Therefore, the bottleneck does not lie in the rendering part. I have tried implementing some experimental OpenCL code for texture blending on the GPU, but it doesn't significantly reduce the time taken as the process of copying textures and terabytes of resulting images in and out of the GPU memory also increases the time.

When you set profile: true in the configuration file and use only 1 thread, you can obtain profiling information to identify the time cost of each function call. The majority of the time is spent on resize and encode, which involves creating thumbnails for smaller images and compressing images into jpg or png format. The next significant time-consuming task is io, which is used for writing images to disk. The last notable task is alpha composite, which encompasses the actual rendering operations, taking only 15 seconds out of 181 seconds.

Top part of a profiling result:

         46309536 function calls (46306953 primitive calls) in 181.467 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      636   45.755    0.072   45.755    0.072 {method 'resize' of 'ImagingCore' objects}
     1921   37.921    0.020   37.921    0.020 {method 'encode' of 'ImagingEncoder' objects}
     7742   20.328    0.003   20.329    0.003 {built-in method _io.open}
   465567   15.211    0.000   15.211    0.000 {built-in method PIL._imaging.alpha_composite}
     8360   12.211    0.001   12.211    0.001 {method 'decode' of 'ImagingDecoder' objects}
     3525    9.374    0.003    9.374    0.003 {method 'convert' of 'ImagingCore' objects}
  3646296    4.980    0.000   47.681    0.000 base.py:28(square)
    11016    3.795    0.000   52.113    0.005 pzdzi.py:366(render_tile)
     1127    3.702    0.003    3.702    0.003 {method 'encode_to_file' of 'ImagingEncoder' objects}
   465649    2.396    0.000    2.396    0.000 {method 'crop' of 'ImagingCore' objects}
     6895    2.378    0.000    2.378    0.000 {built-in method PIL._imaging.fill}
   470224    2.040    0.000    2.040    0.000 {method 'paste' of 'ImagingCore' objects}
     6259    1.511    0.000    1.511    0.000 {method 'getbbox' of 'ImagingCore' objects}
   470224    1.458    0.000   12.492    0.000 Image.py:1661(paste)
 14586310    1.383    0.000    1.383    0.000 {built-in method builtins.divmod}
   946700    1.225    0.000    1.687    0.000 Image.py:514(_new)
   465649    1.052    0.000    4.064    0.000 Image.py:1222(_crop)
  2357200    0.965    0.000    1.602    0.000 Image.py:820(load)
    ...

Considering this, you might wonder if the resize or encode process can be accelerated by using GPU.

When it comes to the "resize" function, there are researchers who are working on creating faster video thumbnails. For example, this one. These solutions typically involve using the GPU for video decoding. However, in our case, the initial image is raw data obtained directly from rendering and doesn't require decoding. If we were to use the GPU, we would need to copy the data in, perform the resizing operation, and then copy the data out. Given that the resizing operation is a simple resampling, the time consumed by the copying process would likely exceed the time saved by using the GPU for the resizing.

For 'encode', the process typically involves standard data compression such as Deflate or LZ-77. Compression algorithms heavily rely on context, making parallelization difficult even on the CPU. Therefore, don't expect the GPU to assist with this.