cowboy8625 / rusty-rain

A cross platform matrix rain made with Rust.
https://rusty-rain.xyz
Apache License 2.0
359 stars 22 forks source link

Reduce CPU load #21

Open mihaigalos opened 2 years ago

mihaigalos commented 2 years ago

Hi, Awesome stuff here!

Did you consider an option for reducing the CPU load? I experimented a bit with removing the rng calls and noticed performance improvements. How about creating a vector of possible values (everything which is currently random) and looping through it?

The user doesn't really care if a parameter is really random or not, they won't notice. They do care about CPU consumption, however. 😅

Here's what I've tried, let me know if you can reproduce:

diff --git a/src/update.rs b/src/update.rs
index 45e68eb..e39b3e7 100644
--- a/src/update.rs
+++ b/src/update.rs
@@ -18,15 +18,14 @@ pub fn reset<F>(create_color: F, rain: &mut Rain, us: &UserSettings)
 where
     F: Fn(style::Color, style::Color, u8) -> Vec<style::Color>,
 {
-    let mut rng = thread_rng();
     let h16 = rain.height;
     let hsize = rain.height as usize;
     let now = Instant::now();
     for i in rain.queue.iter() {
         if rain.locations[*i] > hsize + rain.length[*i] {
-            rain.charaters[*i] = gen::create_drop_chars(h16, &us.group);
+            rain.charaters[*i] = vec!['a', 'b', 'c'];
             rain.locations[*i] = 0;
-            rain.length[*i] = rng.gen_range(4..hsize - 10);
+            rain.length[*i] = 10;
             rain.colors[*i] = create_color(
                 us.rain_color.into(),
                 us.head_color.into(),
@@ -34,7 +33,7 @@ where
             );
             rain.time[*i] = (
                 now,
-                Duration::from_millis(rng.gen_range(us.speed.0..us.speed.1)),
+                Duration::from_millis(10),
             );
         }
     }
cowboy8625 commented 2 years ago

Thanks for the interest in the project!

I have not done any CPU usage test in a while but last it showed to be take up 1-5% on a 200 character wide terminal.

I have seem to miss placed the document on what I was using to test with due to that was over a year ago and a few machines. How are you testing this?

Regardless I do see your point and this would probably nice change.

mihaigalos commented 2 years ago

How are you testing this?

Very flat cargo run --, not sure how relevant it is. I see a 6-9% CPU usage on my old i5 (details below). Maybe I'm biased, I find that a bit too much for something printing chars to the terminal.

CPU stats - click to expand! ``` $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 39 bits physical, 48 bits virtual CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 60 Model name: Intel(R) Core(TM) i5-4570T CPU @ 2.90GHz Stepping: 3 CPU MHz: 1200.000 CPU max MHz: 2900.0000 CPU min MHz: 800.0000 BogoMIPS: 5786.64 Virtualization: VT-x L1d cache: 64 KiB L1i cache: 64 KiB L2 cache: 512 KiB L3 cache: 4 MiB NUMA node0 CPU(s): 0-3 Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable Vulnerability Meltdown: Mitigation; PTI Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling Vulnerability Srbds: Mitigation; Microcode Vulnerability Tsx async abort: Not affected Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt ts c_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm arat pln pts md_clear flush_l1d ```
mihaigalos commented 2 years ago

I was curious so I profiled the release binary. I don't see anything very strange, certainly not related to rng: image callgrind_traces.tar.gz

mihaigalos commented 2 years ago

I see you were already on the right track here, draw is taking 93% of all CPU cycles: image

cowboy8625 commented 2 years ago

yeah I have been meaning to work on the IO part more.

cowboy8625 commented 2 years ago

I would have to do more research to be sure but if I remember correctly when queueing up a write call for the terminal if the queue exceeds a certain size flush is called premature. I believe there is a way to extend the buffer size but I am not sure this would help even if I knew how to do that.

cowboy8625 commented 2 years ago

I see you were already on the right track here, draw is taking 93% of all CPU cycles:

I used flame to profile last time but ultimately came to the same conclusion.

mihaigalos commented 2 years ago

I would have to do more research to be sure but if I remember correctly when queueing up a write call for the terminal if the queue exceeds a certain size flush is called premature.

Yes:

The terminal will flush automatically if the buffer is full.

mihaigalos commented 2 years ago

Potentially relevant: this.

mihaigalos commented 2 years ago

Would you consider using a different crate for outputting to the screen? termatrix which is using termion is nowhere near as cool as your take, but is <2% of CPU (4-5x improvement!).

cowboy8625 commented 2 years ago

I don't believe termion is cross platform but even if it is I do not want to swap out crates there are other ways of getting performance out of crossterm.

Rather than printing each character which can be a lot if your using the shading flag. Printing each row or Printing the hole screen at once could improve performance.

cowboy8625 commented 2 years ago

I will work on reducing RNG calls as well even though its probably not a massive contributor to the performance issue but ever little bit helps.

mihaigalos commented 2 years ago

Rather than printing each character which can be a lot if your using the shading flag. Printing each row or Printing the hole screen at once could improve performance.

That would be great. I noticed the X server is also very busy redrawing the screen (>50% CPU), so perhaps redrawing everything at once would reduce events to it as well! image

cowboy8625 commented 2 years ago

I cant get my CPU up that high. How big is your screen?

image

cowboy8625 commented 2 years ago

Making the screen pretty large does spike up the cpu to 17%.
image

mihaigalos commented 2 years ago

I cant get my CPU up that high. How big is your screen?

A very generic 1920x1080 with the following terminal settings:

~ » tput cols
213
-------------------
~ » tput lines
57

I guess the behavior is more pronounced in my case because the CPU is older. But I actually noticed it on a Linux VM on Windows, which is even more pronounced.

cowboy8625 commented 2 years ago

But I actually noticed it on a Linux VM on Windows, which is even more pronounced.

Yeah windows and mac native terminals do not do well esc codes, but using the alacritty terminal helps a lot with this.

I guess the behavior is more pronounced in my case because the CPU is older.

Yeah that make sense, I'm sure this will help. Some time today I should be done with a basic implementation of it but not all flags will work. It will give a good indication if printing the hole screen helps. (If it doesn't I will be shocked)

cowboy8625 commented 2 years ago

Sorry for the delay. Work has taken over this week. I will have something up on Saturday or soon if I get the time.

cowboy8625 commented 2 years ago

So I have been working on the new draw improvements. This has turned out to be a usual a bit more complicated then I originally thought. Formatting the screen for characters that wider then a single space can throw a wrench in things. LOL I have ideas on fixing it but just thought Id post when I have so far.

One odd bug I manage to make was when adding color to the rain I get a weird solid character row flashing at the top. Just uncomment out the Some here if you want to take a look.

Just keep in mind this is highly unfinished work.

mihaigalos commented 2 years ago

Looks great. Cannot reproduce the bug, can you perhaps try in a docker? Let me know if you need help. Here's what I see (-s doesn't work yet):

cargo run -- -c jap image

cowboy8625 commented 2 years ago

Yeah pretty much no flags work yet.

mihaigalos commented 2 years ago

Hi @cowboy8625, anything I can do to help here?

cowboy8625 commented 2 years ago

No sorry man its been a crazy couple months. Just got married and other life things have been taking all my free time. Ill work on it this weekend for sure.

mihaigalos commented 2 years ago

Wow, congratulations! I wish you all the best. :+1:

cowboy8625 commented 2 years ago

Thanks!

cowboy8625 commented 2 years ago

Body and head colors work now. Some small bugs still to address but slowly getting there when I have time.