code-google-com / srcdemo2

Automatically exported from code.google.com/p/srcdemo2
BSD 2-Clause "Simplified" License
0 stars 0 forks source link

Poor performance at very high resolutions and low blend rates #8

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Set the Video Output to PNG.
2. Set the game to a high resolution like 2560x1440
3. Render a demo with no blurring.

What is the expected output? What do you see instead?
SrcDemo2 runs very slowly, maxing out one core and producing ~1 frame per 
second on my machine. I assume the bottleneck is only having a single thread 
for compressing the enormous images to PNG.

What version of the product are you using? On what operating system?
Build 2012-04-07. Windows 7 x64 SP1.

Please provide any additional information below.
I'd like to see a configuration option for Number of PNG/JPG Compression 
Threads, which defaults to the number of cores minus one. Also, perhaps there 
should be a separate 'frame writer' thread, to handle a queue of frames to be 
written, so that compression threads aren't waiting on IO.

Original issue reported on code.google.com by lx45...@gmail.com on 18 Jul 2013 at 3:29

GoogleCodeExporter commented 9 years ago
I doubt PNG compression really is the bottleneck here. Can you check with 
another format? TGA without RLE compression is pretty much the same as a 
bitmap, so it is really light on CPU (definitely not light on I/O though, so 
the disk may become a bottleneck then).

It would also be interesting to see how this scales with resolution. Is 
performance fine until you hit a particular point which makes it become really 
slow, or does it scale linearly with the number of pixels? etc.

The PNG encoding is done by Java's own image library, I don't think it can be 
sped up much; but it can indeed be made multithreaded.

Original comment by etie...@perot.me on 20 Jul 2013 at 12:26

GoogleCodeExporter commented 9 years ago
I did some more testing at 2560x1440, here are the results:

    CPU %   R MB/s  W MB/s  FPS
TGA 11  42.3    42.3    3
TGA RLE 11  35  35  3.2
PNG 14  10.6    ~6.2    1.08
JPG 90% 17  31.7    1.8 2.75
JPG 10% 17  32  0.25    3

On a side note, I did not expect Google Docs to format a copied spreadsheet 
into plain text that well.

As for resolution, I did some more testing with PNG:

Resolution  FPS CPU %   Pixels      Frame time  Pixels/Frame time
2560x1440   1.08    14  3,686,400   0.93        3,981,312
2048x1152   1.6 14  2,359,296   0.63        3,774,874
1920x1080   1.8 14  2,073,600   0.56        3,732,480
1600x900    2.5 14  1,440,000   0.40        3,600,000
1366x768    3.9 14  1,049,088   0.26        4,091,443
1280x720    4   14    921,600   0.25        3,686,400

Frame time is the number of seconds to encode 1 frame, and Pixels is the number 
of pixels in one frame. P/Ft is the pixel processing capacity, and appears to 
be pretty similar across resolutions except for that blip at 1366x768. Here's a 
chart of the P/Ft: http://i.imgur.com/sT5e5Up.png

Also, during my testing I noticed something interesting. SrcDemo2 appears to 
have a 5 image buffer that will take new images from the engine before it has 
finished writing the current one (based on the 'Last frame processed' and 'Last 
frame saved' fields). Take a look at this performance graph from processing a 
short demo at 1440p: http://i.imgur.com/1qnr6jQ.png Notice the huge spike in 
Read IO (blue) right at the beginning as SrcDemo2 fills the buffer. And notice 
near the end, when the demo ends, for the next ~7 seconds, SrcDemo2 continues 
using 14% CPU and writing frames (pink).

Now I haven't looked at the code in any detail, but the consistent CPU usage 
and Write IO after a demo ends looks like an compression bottleneck to me. I 
tried using JVisualVM to profile the CPU usage and confirm my suspicions, but 
for some reason it doesn't just work like I've come to expect, and I'm not sure 
how to add the necessary JMX arguments to the Java command line on Windows.

If you want me to do any profiling, let me know how to specify JVM args on 
Windows and I'll see what I can do.

Original comment by lx45...@gmail.com on 20 Jul 2013 at 6:26

GoogleCodeExporter commented 9 years ago
That's some pretty good investigative work there, and indeed there is a buffer 
of 4 (not 5) image buffer as seen here: 
https://code.google.com/p/srcdemo2/source/browse/src/net/srcdemo/video/image/Ima
geSaver.java#12

The fact that you're getting very different FPS depending on file format proves 
that it is indeed an image compression bottleneck. I never had this be the 
bottleneck for me, which is why it's a plain singlethreaded loop. Usually (for 
me) the buffer is always empty all the time, except for frames which the game 
can render instantaneously such as the beginning/end frames of the demo, where 
sometimes the entire screen is black and there are no objects to render.

Anyway, fixing this should be a matter of spawning multiple ImageSaver threads 
rather than just one, and sharing the queue between all of them by making it 
static. It should be a matter of minutes, but unfortunately I'm in the middle 
of relocating to a new country right now and don't have time nor the necessary 
machine to test this, so it may be a while before this is fixed.

Original comment by etie...@perot.me on 20 Jul 2013 at 6:52