Open mikehaller opened 1 year ago
is this with dbcfeeder idle or with pushing some data. if so, via logfile or via a vcan?
Just starting the container. I assume it's running the canreplay loop?
dbcfeeder is a bit heavy, but I would not expect that much (I just tested "natively" without container and got 20% load on a single core, but that was a laptop).
Need to check container
Are you using a Pi, or if not can you give some hints what kind of compute power you have?
@erikbosch As you have recently worked on this, do you have any insights?
maybe we need to profile this again, to see, where some gains are. With the way the python can lib and python DBC tooling is working, I'd always expect this to be heavy, but we should make sure we are not messing up somewhere (the queueing had been an issue in the past).
And let's focus on DBC feeder first, I'd guess HVAC might be ripe for a serious rewrite/update anyway.....
Some thoughts. First I did some refactoring recently for the changed DBC config handling which could affect CPU usage somewhat, but there should as far as I know not give any significant changes. Secondly, when using a CAN-dump file there is a (likely) inefficient phase in the beginning where all messages from the dump is loaded into memory which occupy quite a lot of CPU for a few seconds. But after that when running (not in a container) and running in cProfile with default dump file the feeder seemed to spend most time waiting. But that may of course look different on a different platform.
With a quick analysis I could not find any obvious bottlenecks. If needed one could spend more time profiling and compare how CPU time is affected by e.g. changing config on how often data shall be sent to databroker.
erik@debian3:~/kuksa.val.feeders/dbc2val$ more t.txt
8286341 function calls (8204530 primitive calls) in 235.560 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
429/1 0.004 0.000 235.715 235.715 {built-in method builtins.exec}
1 0.000 0.000 235.715 235.715 <string>:1(<module>)
1 0.000 0.000 235.715 235.715 runpy.py:199(run_module)
1 0.000 0.000 235.714 235.714 runpy.py:63(_run_code)
1 0.000 0.000 235.713 235.713 dbcfeeder.py:1(<module>)
1 0.000 0.000 235.146 235.146 dbcfeeder.py:359(main)
1 0.000 0.000 235.142 235.142 dbcfeeder.py:119(start)
1 0.287 0.287 231.658 231.658 dbcfeeder.py:246(_run)
10503 0.121 0.000 202.873 0.019 queue.py:154(get)
5559/5558 0.069 0.000 202.609 0.036 threading.py:288(wait)
32738/32736 202.515 0.006 202.521 0.006 {method 'acquire' of '_thread.lock' objects}
10502 0.228 0.000 26.052 0.002 databroker.py:71(update_datapoint)
10502 0.248 0.000 25.729 0.002 __init__.py:743(set)
10534 0.061 0.000 24.015 0.002 _channel.py:937(__call__)
10534 0.142 0.000 23.889 0.002 _channel.py:919(_blocking)
10534 19.717 0.002 19.989 0.002 {method 'next_event' of 'grpc._cython.cygrpc.SegregatedCall' objects}
1 0.301 0.301 2.695 2.695 canplayer.py:42(__init__)
10534 2.485 0.000 2.589 0.000 {method 'segregated_call' of 'grpc._cython.cygrpc.Channel' objects}
191832 1.770 0.000 2.340 0.000 canutils.py:50(__iter__)
65 0.003 0.000 1.467 0.023 __init__.py:1(<module>)
10502 0.202 0.000 1.231 0.000 __init__.py:508(_prepare_set_request)
2 1.203 0.601 1.203 0.601 {built-in method time.sleep}
10502 0.126 0.000 0.974 0.000 __init__.py:424(to_message)
10502 0.118 0.000 0.935 0.000 dbc2vssmapper.py:116(transform_value)
10502 0.177 0.000 0.804 0.000 __init__.py:384(to_message)
1 0.000 0.000 0.771 0.771 dbcreader.py:33(__init__)
I remember python queues are a bit finicky, and seem to busy loop in certain conditions, did you try whether removing the timeout here https://github.com/eclipse/kuksa.val.feeders/blob/2bcfabdc9a507772a59ce4ee9ff4c21582cdeab7/dbc2val/dbcfeeder.py#L276
changes anyting? If the queue is blocking, should be fine, if you have a non_blocking queue, the trick was in case of a negative response add your own tiny sleep (1 ms).
Changing the queue did not do anything, the call is blocking so a timeout of 1 only means that you will return to the outer loop more often.
Anyhow, I did some more profiling with yappi to get some more "threading" information. It seems that tx-worker (which sends around 250 CAN-signals per second on virtual can when using a candump) is responsible for quite a lot of load. That part shall not be a concern for dbcfeeder if using a "real" CAN source. Some CPU can possibly be saved by refactoring frequent debug-prints (not using f-strings) or a smarter time throttling (filter on raw can id already before dbc conversion, today throttling is performed after DBC conversion based on individual SDL signals. Another possible improvement could be to not use individual gRPC-calls for each update, but rather send them in batches. Exactly how much that could save is however somewhat unclear
Clock type: CPU
Ordered by: totaltime, desc
name ncall tsub ttot tavg
..canplayer.py:42 CANplayer.__init__ 1 0.387402 5.252725 5.252725
..s.py:50 CanutilsLogReader.__iter__ 191.. 2.960808 4.750074 0.000025
..ib/python3.10/threading.py:942 run 4/3 0.000042 4.398443 1.099611
..canplayer.py:75 CANplayer.txWorker 1 0.505677 3.931559 3.931559
..er.py:71 Provider.update_datapoint 2418 0.094510 2.664002 0.001102
..grpc/__init__.py:743 VSSClient.set 2418 0.099178 2.525617 0.001045
..ces/virtual.py:105 VirtualBus.send 10010 0.222000 2.517890 0.000252
..ages/can/bus.py:80 VirtualBus.recv 10014 0.210398 1.991252 0.000199
..ib/python3.10/copy.py:128 deepcopy 875.. 0.704782 1.799281 0.000021
.. _UnaryUnaryMultiCallable.__call__ 2450 0.024782 1.778202 0.000726
../python3.10/queue.py:154 Queue.get 12432 0.339168 1.768578 0.000142
.._UnaryUnaryMultiCallable._blocking 2450 0.061932 1.734419 0.000708
..dbcreader.py:33 DBCReader.__init__ 1 0.000023 1.675066 1.675066
..database/__init__.py:104 load_file 1 0.000021 1.666321 1.666321
..ools/database/__init__.py:261 load 1 0.000007 1.665539 1.665539
..tabase/__init__.py:295 load_string 1 0.000006 1.664382 1.664382
../__init__.py:345 load_can_database 1 0.000019 1.664376 1.664376
..ase.py:210 Database.add_dbc_string 1 0.001186 1.664348 1.664348
..al.py:94 VirtualBus._recv_internal 10014 0.096321 1.615935 0.000161
..an/formats/dbc.py:1960 load_string 1 0.002640 1.378636 1.378636
..essage.py:215 Message.__deepcopy__ 10010 0.150132 1.181531 0.000118
..ase.py:451 Database.decode_message 10010 0.140969 1.044516 0.000104
@SebastianSchildt - I think we need to decide how to continue on this one. Using a dump with about 250 CAN-messages every second will certainly cause some CPU if we are to feed and read 250 messages per second to a virtual CAN bus. There are some improvements we possibly could do but the question is if it would give significant savings, especially for the use-case when we use a dump-service, and how prioritized it is.
It could be leda specific issue, it is not normal to have such cpu usage on hvac service. Is it cycling through TRANSIENT_FAILURE state?
@mikehaller - do you want to keep this one open, we have not done anything for almost a year. I suggest closing it and later create new issues in corresponding repos if needed
Running the containerized versions of the example applications DBC Feeder and HVAC Service takes up a lot of CPU resources continuously.
For the DBC Feeder, this should definitely being investigated and optimized. Of course, running it in a loop may use resources, but it seems to be running at 100% all the time, which is way too much.
The HVAC Service is also taking up lots of CPU resources. It's not clear why this uses CPU at all, since it's just waiting for requests or signals and doesn't do anything on its own.
top snippet:
The containers have been installed on Eclipse Kanto Container Management like this:
The runtime logs of the containers do not show any warnings or errors, it seems to be running as usual:
In comparison, some CPU usage snapshots (top, rounded up)