Closed makuser closed 3 years ago
Not seeing any stacktrace in the log may indicate OOM issues.
I've just added os.freemem() to the memoryStatInterval
feature.
You may want to try the latest master and enable the feature by setting an interval (in ms) for debug.memoryStatInterval
in the config file
Hi, I seem to have a similar issue. I'll contribute my logs here:
[2021-03-12T16:33:03.622Z] [INFO] Cloud message timed out. Assuming that we're not connected anymore
[2021-03-12T16:33:20.896Z] [WARN] Token is okay, however we're unable to reach the vacuum { retries: 10, method: 'get_consumable', args: [] }
[2021-03-12T16:33:21.197Z] [INFO] Cloud connected
[2021-03-12T16:33:43.744Z] [INFO] Cloud message timed out. Assuming that we're not connected anymore
[2021-03-12T16:34:01.368Z] [INFO] Cloud connected
[2021-03-12T16:35:23.450Z] [INFO] Cloud message timed out. Assuming that we're not connected anymore
[2021-03-12T16:35:23.460Z] [ERROR] Failed to poll Attributes MiioTimeoutError: request timed out:{"method":"get_consumable","params":[],"id":7156}
at Timeout.timeout [as _onTimeout] (/snapshot/Valetudo/lib/miio/MiioSocket.js:185:32)
at listOnTimeout (internal/timers.js:549:17)
at processTimers (internal/timers.js:492:7)
[2021-03-12T16:35:41.874Z] [INFO] Cloud connected
[2021-03-12T16:39:24.411Z] [INFO] Cloud message timed out. Assuming that we're not connected anymore
[2021-03-12T16:39:25.747Z] [ERROR] Failed to poll Attributes MiioTimeoutError: request timed out:{"method":"get_consumable","params":[],"id":7384}
at Timeout.timeout [as _onTimeout] (/snapshot/Valetudo/lib/miio/MiioSocket.js:185:32)
at listOnTimeout (internal/timers.js:549:17)
at processTimers (internal/timers.js:492:7)
[2021-03-12T16:39:32.659Z] [WARN] Token is okay, however we're unable to reach the vacuum { retries: 10, method: 'get_status', args: {} }
[2021-03-12T16:39:42.543Z] [INFO] Cloud connected
[2021-03-12T16:42:29.288Z] [INFO] Cloud message timed out. Assuming that we're not connected anymore
[2021-03-12T16:42:33.125Z] [WARN] Token is okay, however we're unable to reach the vacuum { retries: 10, method: 'get_consumable', args: [] }
[2021-03-12T16:42:38.073Z] [INFO] << cloud: ignoring response for non-pending request {"id":7474,"result":["map_upload_handler"]}
[2021-03-12T16:42:38.197Z] [INFO] Cloud connected
[2021-03-12T16:43:01.959Z] [INFO] Cloud message timed out. Assuming that we're not connected anymore
[2021-03-12T16:43:03.779Z] [INFO] << cloud: ignoring response for non-pending request {"id":7479,"result":[{"msg_ver":3,"msg_seq":284,"state":5,"battery":78,"clean_time":960,"clean_area":22650000,"error_code":0,"map_present":1,"in_cleaning":1,"in_returning":0,"in_fresh_state":0,"lab_status":1,"water_box_status":0,"fan_power":102,"dnd_enabled":0,"map_status":3,"lock_status":0}]}
[2021-03-12T16:43:03.996Z] [INFO] Cloud connected
[2021-03-12T16:44:24.904Z] [INFO] Cloud message timed out. Assuming that we're not connected anymore
[2021-03-12T16:44:50.964Z] [INFO] Cloud connected
[2021-03-12T16:47:06.056Z] [INFO] Loading configuration file: /mnt/data/valetudo/valetudo_config.json
[2021-03-12T16:47:06.087Z] [INFO] Set Logfile to /tmp/valetudo.log
[2021-03-12T16:47:06.095Z] [INFO] Autodetected RoborockS5ValetudoRobot
[2021-03-12T16:47:06.266Z] [INFO] Starting Valetudo 2021.02.0
[2021-03-12T16:47:06.267Z] [INFO] Configuration file: /mnt/data/valetudo/valetudo_config.json
[2021-03-12T16:47:06.268Z] [INFO] Logfile: /tmp/valetudo.log
[2021-03-12T16:47:06.272Z] [INFO] Robot: Beijing Roborock Technology Co., Ltd. S5 (RoborockS5ValetudoRobot)
[2021-03-12T16:47:06.273Z] [INFO] JS Runtime Version v14.4.0
[2021-03-12T16:47:06.275Z] [INFO] Max Heap Size: 33.5 MiB
[2021-03-12T16:47:06.276Z] [INFO] Node Flags: --max-old-space-size=32
[2021-03-12T16:47:06.287Z] [INFO] DeviceId xxxxxx
[2021-03-12T16:47:06.288Z] [INFO] IP 127.0.0.1
[2021-03-12T16:47:06.289Z] [INFO] CloudSecret xxxxxxxxxxxx
[2021-03-12T16:47:06.290Z] [INFO] LocalSecret xxxxxxxxxxxxxxx
[2021-03-12T16:47:06.292Z] [INFO] Firmware Version: 3.5.8_002020
[2021-03-12T16:47:06.961Z] [INFO] Dummycloud is spoofing 127.0.0.1:8053 on 127.0.0.1:8053
[2021-03-12T16:47:06.970Z] [INFO] Webserver running on port 80
[2021-03-12T16:47:06.991Z] [INFO] Map Upload Server running on port 8079
[2021-03-12T16:46:58.005Z] [INFO] Successfully set the robot time via NTP to 2021-03-12T16:46:58.890Z
[2021-03-12T16:46:58.368Z] [INFO] Connected successfully to mqtt server
[2021-03-12T16:47:02.497Z] [INFO] Cloud connected
My model is a S5. It happened while I was running a cleanup. if that helps.
Not seeing any stacktrace in the log may indicate OOM issues.
I've just added os.freemem() to the
memoryStatInterval
feature.You may want to try the latest master and enable the feature by setting an interval (in ms) for
debug.memoryStatInterval
in the config file
Unfortunately it has not happened ever since... Anyway, I enabled the function and I will be getting back once I find something
5787afa adds some additional forced garbage collection to the lowmem build which might also be helpful here.
npm run build_armv7_lowmem
The same thing happens to me. The S5E crashes after a while. I am not running the lowmem version yet, but seeing that the free mem went down to 2MB it might be a good idea. After some while, the ssh shell crashed out as well.
However, i could not find any oom_killer logs in the messages or syslog files. The vm.min_free_kbytes
is set to 2038, which would align with the ~2MB of mem left.
The lowmem 2021.03.0 release or the latest master lowmem build? The latter should work much better
The lowmem 2021.03.0 release or the latest master lowmem build? The latter should work much better
Neat, i now tried the current master build. Works like a charm. Free memory hovers around 50MB now.
@sfspeiser I'd assume the debug log now shows quite a few Garbage collection forced
loglines?
@Hypfer yes, every few seconds during a large room cleaning.
Hey, I can also confirm that the lowmem build is much more stable. Thanks for that :+1:
Ok, aperantly the issue is not solved for me yet. After a few room cleanings the robot crashed again. It looks like there is some memory leak, as the garbage collection can not re claim memory. This is the last memory statement before the entire robot froze:
[2021-04-11T10:36:12.955Z] [INFO] Memory Stats {
rss: '147.199 MiB',
heapTotal: '8.523 MiB',
heapUsed: '7.131 MiB',
external: '111.550 MiB',
arrayBuffers: '109.997 MiB',
freeSystemMemory: '2.184 MiB'
}
[2021-04-11T10:36:13.518Z] [DEBUG] Garbage collection forced. rss: 154800128
The memory usage increased with each room cleaning an did not go back down after a cleaning was finished. After 3 cleanings the robot ran out of memory. The robot is running the low mem build using the 5c8495ce2eff247a667e87b2649c98ee13bddf28 commit.
I am now thinking of running valetudo on my local server using x86... that would at least circumvent the issue of the robot crashing.
It seems like we're pretty much pushing the limits of nodejs here. Documentation is getting quite sparse
global.gc() seems to accept additional parameters:
https://github.com/v8/v8/commit/aa7c6e22f963ffcd49898521890cde8d78e4fcc5
This commit message contains somewhat more information on that. Apparently, it defaults to a major gc (whatever that means) so overriding it so that it does a minor one doesn't make much sense I suppose. Interestingly, when googling, one of the few results that exist suggested the exact opposite.
(Un)fortunately, I'm not able to reproduce your issue on my robots.
Anyways, the only idea left would be to resort to doing manual memory management 👀 One place where buffers are constantly created and disposed again is this one: https://github.com/Hypfer/Valetudo/blob/5c8495ce2eff247a667e87b2649c98ee13bddf28/lib/robots/MiioValetudoRobot.js#L70-L99
While calling Buffer.concat on all chunks is the first google result for doing file upload handling with expressjs, in our situation, it might make sense to work with a single buffer and write data to the offsets. Expressjs would probably still internally create more buffers, but we could still reduce that amount by a bit. I'm of course just guessing here. It might not even make any difference at all
271bac7ae0d6d7a6b3fb3d014847c2fcf56556cb
There we go. Another thing to look into would be mqtt map publishing
@sfspeiser Do you have mqtt map data stuff enabled? Also, was the webinterface open during those three cleanups?
There we go. Another thing to look into would be mqtt map publishing
@sfspeiser Do you have mqtt map data stuff enabled? Also, was the webinterface open during those three cleanups? Thanks for your commit, sadly it did not change anything.
The web interface was open during the runs (on my pc and my phone aswell). But even when closing it, the memory is not reclaimed. I now disabled MQTT all together and still have the same issue. I have a feeling it might be an issue with the map. Tomorrow I will reset it and let it map my flat again. I can't imagine why else this error is not reproducible...
98b2757aec0af94392038577eead7749ef659e17 and 0d7d8364b98e950d186d961062b81fbfff5e920a may also improve lowmem performance especially when dealing with large maps
I however just guessed that since I can't reproduce the issue locally so further testing would be much appreciated
Closing due to inactivity. Please open a new one if this is still an issue
Describe the bug
Valetudo crashes (exits) randomly and needs to be restarted
To Reproduce
I do not know yet, unfortunately
Screenshots
Vacuum Model
Roborock S5 Max
Valetudo Version
2021.02.0
Expected behavior
Run until killed
Additional context