X3n0m0rph59 / precached

Precached - A Linux process monitor and pre-caching daemon
https://x3n0m0rph59.gitlab.io/precached/
GNU General Public License v3.0
13 stars 0 forks source link

high cpu usage 2 #14

Closed idnovic closed 5 years ago

idnovic commented 5 years ago

I found that precached uses 9-20% cpu atm. The cpu usage was not as high before.

What command can I run to find out what precached is calculating? I stopped the process atm but I want to investigate it.

idnovic commented 5 years ago

Better idea: Above question still stands but also how to I reset precached completely? Maybe something got in the way since I run the dev version since a few days now.

X3n0m0rph59 commented 5 years ago

I found that precached uses 9-20% cpu atm. The cpu usage was not as high before.

Maybe it is optimizing trace logs in the background? Can you reproduce this with log_level set to debug and show some logs?

What command can I run to find out what precached is calculating?

You can use perf top to investigate what functions are the main culprit. Since you are running the release build of precached (without debug symbols) you may find perf trace better suited. It will show you what system calls are executed by precached.

$ sudo perf top -p $(pgrep precached)
$ sudo perf trace -p $(pgrep precached)

how to I reset precached completely?

To completely reset the state of precached you may run these commands:

$ sudo systemctl stop precached
$ sudo iotracectl clear 
$ sudo precachedctl plugins hot-applications optimize
idnovic commented 5 years ago

Bildschirmfoto vom 2019-07-07 16-27-24 log.txt.zip

the screenshot is the first command and the log is just 10 seconds of the second command

X3n0m0rph59 commented 5 years ago

Thank you for investigating this! The screenshot looks interesting indeed! Looks like an issue in a dependency of precached (sys_info crate). Let's see whether I can reproduce this...

idnovic commented 5 years ago

I deleted now the state of precached as you guided above. Still need to do a restart

idnovic commented 5 years ago

So it seems that you need to revert the systemd private tmp settings. After I cleared precached and restarted no preloading was done. I reverted the config and now the preloading works.

I do not understand. Why precached can not access its own tmp dir?

idnovic commented 5 years ago

It seems that it was able to access the old files maybe even write to them but without existing files it did not work.

X3n0m0rph59 commented 5 years ago

So it seems that you need to revert the systemd private tmp settings. After I cleared precached and restarted no preloading was done. I reverted the config and now the preloading works.

OK, I reverted the change!

I do not understand. Why precached can not access its own tmp dir?

I will investigate this, trying to re-enable all systemd protection features that are applicable.

idnovic commented 5 years ago

Bildschirmfoto vom 2019-07-09 13-58-51 log.txt.zip

Happend again. Newest precached version and I set it to use the precached user.

I can tell now that it has nothing to do with /app/*/

idnovic commented 5 years ago

I forgot that precached runs as its own user now. I need to use sudo with precachedtop. I removed the last sentence of the above comment about precachedtop not working.

idnovic commented 5 years ago

Something new happened. Precached uses now 25% CPU directly after power up. Yesterday I stopped the precached service because of the high CPU use. Today precached seems to hick up at the same point.

Does this mean a single trace log is responsible?

Bildschirmfoto vom 2019-07-10 10-59-18 log.txt.zip

idnovic commented 5 years ago

Bildschirmfoto vom 2019-07-10 21-57-26

This looks like repeat and repeat. I currently can not use precached at all. CPU usage is after start up at 25% and not going anywhere.

I will reset it again later. Edit: After I created log files.

reproduce this with log_level set to debug

How can I set the log level to debug?

X3n0m0rph59 commented 5 years ago

Sorry, for the inconvenience, I will prepare a new copr build of precached today!

To set the log level to debug or trace, please edit the file log4rs.yaml

sudoedit /etc/precached/log4rs.yaml

Please find line 30, and change it to:

level: debug or level: trace for even more log output.

X3n0m0rph59 commented 5 years ago

I just pushed precached-1.7.1-0, the Copr build should be finished soon!

Please note that this build ships with a pretty invasive change to the memory accounting. To fully support ZRAM, swap is now included into the calculations (added to the installed RAM size). As a consequence, one needs to re-tune the parameters set in precached.conf. On my development laptop with 4GiB (3.71 GiB usable) of RAM and 4.72 GiB of swap space (ZRAM and swap partition) I am now using these values:

# Memory management settings
available_mem_critical_threshold=45
available_mem_upper_threshold=40
available_mem_lower_threshold=25

Precached now performs the following calculations:

total_memory = ram_size + swap_size
mem_used = (ram_size - ram_used) + swap_used
percentage = mem_used * 100 / total_memory

So if you have 4 GiB of RAM and approx. 4GiB of swap you just need to halve your available_mem_* percentage settings, to account for the changes in precached-1.7.1.

idnovic commented 5 years ago

I want to try the log file thing in the next hour. Where do I find the log file after I set the setting to debug?

Edit, found it.

 # logfile:
  #   kind: file
  #   path: "/var/log/precached.log"
  #   encoder:
  #     pattern: "{d(%Y-%m-%d %H:%M:%S %Z)} [{l}]: {M}:{L} - {m}{n}"
X3n0m0rph59 commented 5 years ago

You will find it in the systemd journal. You may use this command to dump it to a file:

$ sudo journalctl -b -u precached > precached-debug-$(date --iso).log

idnovic commented 5 years ago

I will use journald. Good point. Thank you.

idnovic commented 5 years ago

Only works with precached.service instead of just precached for me but you will have your log in a moment. 26mb.

idnovic commented 5 years ago

precached-debug-log-2019-07-11.log.zip

I will reset precached now and update it to the newest version including necessary config changes.

idnovic commented 5 years ago

I do not know if this is related but I have the following set

vm.swappiness=85
vm.vfs_cache_pressure=75
vm.dirty_background_ratio = 3
vm.dirty_ratio = 7
vm.overcommit_memory = 2
vm.overcommit_ratio=75
vm.dirty_background_bytes=16777216
vm.dirty_bytes=16777216

I wanted to start gnome-twitch (twitch.tv player) during my current test and I was not able to. Error was: FATAL: Could not allocate gigacage memory with maxAlignment = 34359738368, totalSize = 103079215104. Workaround GIGACAGE_ENABLED=no environment variable.

It is possible that precached faced the same error. The responsible sysctl settings are:

vm.overcommit_ratio = 
vm.overcommit_memory = 

I found a bug report for atril, the document viewer of mate desktop. Same error but it happened because atril wanted to mmap 128GB of memory (the system had 16GB).

The above settings for sysctl are a trial for my memory limited laptop. I think you have a better understanding if precached needs memory overcommit and if my above settings are interfering.

X3n0m0rph59 commented 5 years ago

Regarding your virtual memory subsystem configuration:

vm.swappiness=85
vm.vfs_cache_pressure=75
vm.dirty_background_ratio = 3
vm.dirty_ratio = 7
vm.overcommit_memory = 2
vm.overcommit_ratio=75
vm.dirty_background_bytes=16777216
vm.dirty_bytes=16777216

This looks reasonable except for:

vm.overcommit_memory = 2   # <- will lead to application crashes

I don't think that this will work out as intended! You are risking lots of software to fail. Overcommit is basically required by every process that calls fork(). If the process has a reasonably large vmsz (size of virtual memory allocated address space), it will crash without Linux being able to overcommit memory. Additionally, most memory allocators have a large amount of overhead, in that they consume huge amounts of address space (not memory though). The Java VM may serve as an example here.

Precached therefore needs overcommit memory enabled too. The precached process' vm size, directly after re-booting is currently larger than my installed RAM size.

idnovic commented 5 years ago

Okay I will change this to vm.overcommit_memory = 0

An explanation for the bug could be that precached did not unload precached files for me before 1.7.1 and grew bigger and bigger(like java). And at a certain point it was not able to allocate enough memory. But I can not see a related output of this. Precached did not seem to know what wrong is.

Precached allocated up to 1.7GB ram before. Never over 2GB. So it did work with more than 1GB.

Maybe add a memory information profiler for precached. This could help with error reports. Just collect important information at startup and put it inside the log file of precached.

idnovic commented 5 years ago
CommitLimit = (total_RAM - total_huge_TLB) *
                         overcommit_ratio / 100 + total_swap

So this is how linux calculates the over commit in mode 2. I will try to increase this first before I try to change it back to 0. In mode 0 the oom killer gets used. I wanted to avoid that.

In mode 0, calls of mmap(2) with MAP_NORESERVE are not
       checked, and the default check is very weak, leading to the
       risk of getting a process "OOM-killed".

This does not sound very useful. Maybe when my memory is full I should not be able to launch more stuff instead. I am not sure. I do not have programs running I want to get killed.

idnovic commented 5 years ago

vm.overcommit_ratio=75 So far I found out that I need to set this to at least 100 while I have mode 2 set up. Currently, I am virtually shrinking my available memory. Also GIGACAGE_ENABLED seems security related so I am not going to use the work around.

idnovic commented 5 years ago

I changed it back to mode 0 (default). vm.overcommit_ratio does not apply in mode 0. Did you find anything useful in the log files I uploaded?

X3n0m0rph59 commented 5 years ago

I changed it back to mode 0 (default)

Yes, that's the only reasonable choice for desktop systems.

Did you find anything useful in the log files I uploaded?

I did not find anything useful, so it is unclear what exactly happened during the load spikes.

idnovic commented 5 years ago

I believe you can close this issue now. I do not have problems with the version 1.7.1