Logicalshift / flo_draw

2D rendering libraries for Rust and FlowBetween
Apache License 2.0
101 stars 6 forks source link

Performance issues? #3

Closed actuday6418 closed 2 years ago

actuday6418 commented 2 years ago

Hi, first of all, thank you for this great library! I'm trying to render a video stream, but am having some problems getting a usable frame rate. I decided to test the bounce_sprites example for FPS stats, and got consistently 60 fps, but the rendered window looks choppy, more like 5 fps. I can send you a screen recording if that would be helpful.

I'm an absolute novice, but it feels like frames are being rendered but not drawn to the window or something like that.

actuday6418 commented 2 years ago

Here's the complete code I ran for testing FPS:

use flo_canvas::*;
use flo_draw::*;

use rand::*;

use std::sync::Mutex;
use std::thread;
use std::time::Duration;

struct Ball {
    sprite_id: SpriteId,
    radius: f64,
    x: f64,
    y: f64,

    dx: f64,
    dy: f64,
}

impl Ball {
    ///
    /// Generates a new ball
    ///
    pub fn random(sprite_id: SpriteId, canvas: &DrawingTarget) -> Ball {
        // Decide on how the ball is rendered
        let col = Color::Hsluv(
            random::<f32>() * 360.0,
            random::<f32>() * 100.0,
            random::<f32>() * 75.0 + 25.0,
            1.0,
        );
        let radius = random::<f64>() * 16.0 + 16.0;

        // Declare the sprite
        canvas.draw(|gc| {
            gc.sprite(sprite_id);
            gc.clear_sprite();

            gc.new_path();
            gc.circle(0.0, 0.0, radius as f32);
            gc.fill_color(col);
            gc.fill();
        });

        Ball {
            sprite_id,
            radius,
            x: random::<f64>() * 1000.0,
            y: random::<f64>() * 1000.0 + 64.0,
            dx: random::<f64>() * 8.0 - 4.0,
            dy: random::<f64>() * 8.0 - 4.0,
        }
    }

    ///
    /// Moves this ball on one frame
    ///
    pub fn update(&mut self) {
        // Collide with the edges of the screen
        if self.x + self.dx + self.radius > 1000.0 && self.dx > 0.0 {
            self.dx = -self.dx;
        }
        if self.y + self.dy + self.radius > 1000.0 && self.dy > 0.0 {
            self.dy = -self.dy;
        }
        if self.x + self.dx - self.radius < 0.0 && self.dx < 0.0 {
            self.dx = -self.dx;
        }
        if self.y + self.dy - self.radius < 0.0 && self.dy < 0.0 {
            self.dy = -self.dy;
        }

        // Gravity
        if self.y >= self.radius {
            self.dy -= 0.2;
        }

        // Move this ball in whatever direction it's going
        self.x += self.dx;
        self.y += self.dy;
    }
}

///
/// Bouncing ball example that uses sprites to improve performance
///
/// bounce.rs renders the paths every frame, so each circle has to be re-tessellated every time. This uses
/// sprites so that the paths are only tessellated once, which reduces the CPU requirements considerably.
///
pub fn main() {
    // 'with_2d_graphics' is used to support operating systems that can't run event loops anywhere other than the main thread
    with_2d_graphics(|| {
        // Create a window with a canvas to draw on
        let canvas = create_drawing_window("Bouncing sprites");

        // Clear the canvas to set a background colour
        canvas.draw(|gc| {
            gc.clear_canvas(Color::Rgba(0.6, 0.7, 0.8, 1.0));
        });

        // Generate some random balls
        let mut balls = (0..2)
            .into_iter()
            .map(|idx| Ball::random(SpriteId(idx), &canvas))
            .collect::<Vec<_>>();

        let fps = std::sync::Arc::new(Mutex::new(0u64));
        let fps2 = fps.clone();
        std::thread::spawn(move || loop {
            thread::sleep(Duration::from_secs(1));
            let fps = fps2.lock().unwrap();
            let a = *fps;
            std::mem::drop(fps);
            println!("fps: {}", a);
        });

        // Animate them
        loop {
            let now = std::time::Instant::now();
            // Update the balls for this frame
            for ball in balls.iter_mut() {
                ball.update();
            }

            // Render the frame on layer 0
            canvas.draw(|gc| {
                gc.layer(LayerId(0));
                gc.clear_layer();
                gc.canvas_height(1000.0);
                gc.center_region(0.0, 0.0, 1000.0, 1000.0);

                for ball in balls.iter() {
                    // Render the ball's sprite at its location
                    gc.sprite_transform(SpriteTransform::Identity);
                    gc.sprite_transform(SpriteTransform::Translate(ball.x as f32, ball.y as f32));
                    gc.draw_sprite(ball.sprite_id);
                }
            });

            // Wait for the next frame
            thread::sleep(Duration::from_nanos(1_000_000_000 / 60));
            *fps.lock().unwrap() = (1000f32 * (1f32 / now.elapsed().as_millis() as f32)) as u64;
        }
    });
}
Logicalshift commented 2 years ago

Hm, no, that's definitely not right. I think I'll need to get some more information about your setup before I can figure out what's going on.

flo_draw actually runs the renderer in a separate thread, so if it is running slowly you might not be able to tell from your main thread (it also knows how to skip frames in order to catch up).

Firstly, which version of flo_draw are you using? v0.3 and lower use only OpenGL for rendering and the newer v0.4 which I'm working on defaults to WGPU with OpenGL as an option.

Which system are you running on? I tend to develop on OS X but I do have some Linux and Windows systems to test on as well.

Do any of the other demos work OK or are they all slow? I have had a perf issue that only affects sprites in the past.

Have you tried compiling with the --release flag? It does make a big difference to performance, though bounce_sprites is a small enough demo that it shouldn't be necessary unless the number of sprites is increased to a much larger value.

If you are using v0.3, have you tried v0.4? It's not on crates.io so you'll need to compile it from here. This will default to the WGPU renderer instead of the open GL one.

If you are on v0.4, have you tried the `render-opengl' feature flag to switch to OpenGL rendering?

I've also just added a profile feature flag to v0.4 which will output some useful profiling information to help make it easier to track these issues down. Here's what it displays for me for bounce_sprites for the two choices of renderer:

= WGPU ==== FRAME 226 @ 4.61s === 48.0 fps === 7.91ms = 13.64ms idle ===

    22722 primitives

   RenderToFrameBuffer  |     4621µs |       2 | ################
   ShowFrameBuffer      |      939µs |       1 | ###
   SelectRenderTarget   |      692µs |       2 | ##
   Clear                |      509µs |       1 | #
   DrawIndexedTriangles |      176µs |     256 | 
   SetTransform         |      160µs |     517 | 
   CreateRenderTarget   |       76µs |       2 | 
   DrawFrameBuffer      |       38µs |       1 | 
   CreateVertex2DBuffer |       30µs |       1 | 
   FreeTexture          |        3µs |       2 | 
   DrawTriangles        |        2µs |       1 | 
   BlendMode            |        2µs |       4 | 
   FreeRenderTarget     |        1µs |       2 | 
   UseShader            |        0µs |       2 | 

    |                             
    |#############################
    |#############################
    |#############################
    ||||||||||||||||||||||||||||||
    ||||||||||||||||||||||||||||||
    ||||||||||||||||||||||||||||||
    ||||||||||||||||||||||||||||||
    ||||||||||||||||||||||||||||||
    ||||||||||||||||||||||||||||||
    +-----------------------------
= OPENGL ==== FRAME 335 @ 6.79s === 49.0 fps === 1.70ms = 21.11ms idle ===

    22830 primitives

   SetTransform         |      605µs |     517 | ################
   DrawIndexedTriangles |      440µs |     256 | ###########
   DrawTriangles        |       48µs |       1 | #
   Clear                |       44µs |       1 | #
   DrawFrameBuffer      |       25µs |       1 | 
   CreateRenderTarget   |       12µs |       2 | 
   CreateVertex2DBuffer |        8µs |       1 | 
   UseShader            |        5µs |       2 | 
   FreeTexture          |        4µs |       2 | 
   BlendMode            |        1µs |       4 | 
   FreeRenderTarget     |        1µs |       2 | 
   SelectRenderTarget   |        1µs |       2 | 
   RenderToFrameBuffer  |        1µs |       2 | 
   ShowFrameBuffer      |        0µs |       1 | 

    |                             
    |             |               
    |||  ||   | | | | |  |   |   |
    |||| |||||| ||| ||| ||||||||||
    ||||||||||| ||||||||||||||||||
    ||||||||||||||||||||||||||||||
    ||||||||||||||||||||||||||||||
    ||||||||||||||||||||||||||||||
    ||||||||||||||||||||||||||||||
    ||||||||||||||||||||||||||||||
    +-----------------------------

I'd be quite curious to see what it says for you.

actuday6418 commented 2 years ago

Here's some information:

OS: Arch Linux GPU: Intel UHD Graphics 620 Host: Acer Aspire A515-51 V1.17 flo_draw version: 0.4.0

Yes, I have been compiling under the --release flag.

Using the OpenGL feature flag fixes the performance issue completely, so thank you for that! :

= OPENGL ==== FRAME 554 @ 9.20s === 60.2 fps === 2.54ms = 14.87ms idle ===

    22446 primitives

   DrawIndexedTriangles |      899µs |     256 | ################
   SetTransform         |      507µs |     517 | #########
   CreateRenderTarget   |      368µs |       2 | ######
   DrawFrameBuffer      |      137µs |       1 | ##
   CreateVertex2DBuffer |       70µs |       1 | #
   DrawTriangles        |       52µs |       1 |
   Clear                |       25µs |       1 |
   FreeTexture          |       21µs |       2 |
   UseShader            |       12µs |       2 |
   RenderToFrameBuffer  |       10µs |       2 |
   BlendMode            |        8µs |       4 |
   FreeRenderTarget     |        8µs |       2 |
   SelectRenderTarget   |        7µs |       2 |
   ShowFrameBuffer      |        0µs |       1 |

    |
    |#   ####         ####     ###
    ||###|||| ||||||||||||||||||||
    ||||||||||||||||||||||||||||||
    ||||||||||||||||||||||||||||||
    ||||||||||||||||||||||||||||||
    ||||||||||||||||||||||||||||||
    ||||||||||||||||||||||||||||||
    ||||||||||||||||||||||||||||||
    ||||||||||||||||||||||||||||||

With regard to WGPU. I tried bounce, bounce_sprites, texture_spin and texture_sprites. One interesting thing I noticed is that the drawing becomes fluid if the window is made sufficiently small. However, intel_gpu_top does not report 100% utilisation when the window is maximised, nor does the GPU utilisation change for certain (Render/3D engine utilisation sticks to the 20-40 range) when the window size is reduced.

= WGPU ==== FRAME 976 @ 21.99s === 45.4 fps === 71.80ms = 0.41ms idle ===

    6 primitives

   CreateRenderTarget   |    41783µs |       2 | ################
   SelectRenderTarget   |    29185µs |       1 | ###########
   Clear                |      254µs |       1 |
   RenderToFrameBuffer  |      154µs |       1 |
   CreateVertex2DBuffer |       20µs |       1 |
   DrawTriangles        |        2µs |       1 |
   BlendMode            |        0µs |       2 |
   SetTransform         |        0µs |       2 |
   UseShader            |        0µs |       1 |

    |
    |    #                       #
    |    #                       #
    |    #     #   #           # #
    |  # #     #   #           # #
    |  # #     #   #           # #
    |  # ##    ##  #  #  # ##  # #
    |  # ##    ##  #  #  # ##  # #
    |  # ##    ##  #  #  # ##  # #
    |  ####   ###  #  #  ####  # #
actuday6418 commented 2 years ago

Issue with WGPU still exists, but closing cause OpenGL works.