Closed bdraco closed 3 years ago
It's strange. Even though time()
is a little less accurate, I'm not sure that the problem is there. I have to do some tests. Are other users impacted or have you noticed this same behavior?
Otherwise the solution would be as follows:
import socket
from struct import unpack
# SO_TIMESTAMP = 29
SO_TIMESTAMPNS = 35
s = socket.socket(socket.AF_INET, socket.SOCK_RAW, socket.IPPROTO_ICMP)
s.setsockopt(socket.SOL_SOCKET, SO_TIMESTAMPNS, 1)
data, ancdata, msg_flags, address = s.recvmsg(1024, 1024)
timestamp_parts = unpack('4i', ancdata[2])
timestamp = timestamp_parts[0] + timestamp_parts[2] * 1e-10
However, this solution only works on Linux, excluding macOS and Windows. So I would prefer to find another solution.
https://www.kernel.org/doc/Documentation/networking/timestamping.txt https://man7.org/linux/man-pages/man7/socket.7.html https://docs.python.org/3/library/socket.html#socket.socket.recvmsg
Right now we only have the one linked report, and I have noticed the timing are a bit difference vs ping.
OK, I will see what I can do :) On the other hand, if indeed the problem comes from the time
function, it will probably be necessary to wait for version 2.0
which will bring a new architecture https://github.com/ValentinBELYN/icmplib/issues/6#issuecomment-694517448.
Hi @bdraco,
I performed several tests to locate the source of the problem. Here are the results I got.
Before starting, I would like to detail the conditions of the tests. I performed 4 different tests. I used two separate virtual machines (Ubuntu 20.04, Python 3.8) with 1 and 2 cores respectively.
I performed a first test on the first machine by executing a simple ping
with icmplib (100 pings with an interval of 100 ms). I noted the results obtained with the time
function currently used and the time taken on the sockets (https://github.com/ValentinBELYN/icmplib/issues/15#issuecomment-695332733). I then used the command stress -c 1
to stress the CPU and check if there was more differences between the time
function and the socket time.
Finally, I repeated the same operations on the second machine but with the command stress -c 2
to stress the CPU. As I thought, the results were the same as for the previous tests. Therefore, I will not detail them.
Here is an overview of the results:
As we can see, the stress of the CPU has no impact on the time
function (so the problem does not come from the time
function). On the other hand, we can see that this function returns a value less accurate than that returned by the socket. In the worst case, there is a difference of 800 ms.
However, we also see another behavior of the time
function. The difference between the two times increases linearly until reaching a difference of 800 ms. After that, this difference/inaccuracy back to zero. Given that the send
method also calls the time
function (it is not possible to get the time at the socket level), the difference/inaccuracy between sending and receiving is almost zero. In other words, if the time
function adds an inaccuracy of 400 ms during sending for example, it also adds it on reception which cancels it.
So, the problem comes from elsewhere and I may have an idea. When you ping several times with a low interval or in parallel, the operating system or the routers seem to prioritize the ICMP flow and consequently reduce the round-trip times.
You can test this behavior with the following Python code:
from icmplib import ping
print(ping('1.1.1.1', count=10).avg_rtt)
print(ping('1.1.1.1', count=10, interval=0.1).avg_rtt)
This also happens with your system's ping command:
ping 1.1.1.1
ping -i 0.1 1.1.1.1
Since it is possible to perform checks / ping at different times with Home Assistant, this can create variations in round-trip times (depending on the number of simultaneous checks). I don't understand, however, why this phenomenon was not happening before on your side, before migrating to icmplib.
@andriej (https://github.com/home-assistant/core/issues/40222)
@shirou93 @Misiu @blair287 (https://github.com/home-assistant/core/issues/40232)
Can you give @ValentinBELYN 's steps above a try and post the results?
Sure. Both of these commands differ from HA graph. They give constans 1-2ms while graphs are going up to 15-20ms
I need more information to continue my investigations. I just ran some other tests with Wireshark and the times seem to match as well.
@bdraco Otherwise I just updated the library to fix the problem found in issue #21.
I need more information to continue my investigations. I just ran some other tests with Wireshark and the times seem to match as well.
Unfortunately I can't replicate the issue on my local network, so its going to be up to @andriej @shirou93 @Misiu or @blair287 to provide the detail needed.
@bdraco Otherwise I just updated the library to fix the problem found in issue #21.
Perfect, TYVM. I'll get Home Assistant updated.
OK, thanks for the information :)
I need more information to continue my investigations. I just ran some other tests with Wireshark and the times seem to match as well.
Unfortunately I can't replicate the issue on my local network, so its going to be up to @andriej @shirou93 @Misiu or @blair287 to provide the detail needed.
@bdraco Otherwise I just updated the library to fix the problem found in issue #21.
Perfect, TYVM. I'll get Home Assistant updated.
After update to 0.116.2 I am no longer seeing the ping errors I will test further to check issue is gone maybe they were related to the other fix?
Thank you for your feedback 👍
It's possible. I don't think it's related to recent fixes to this library. Maybe it was related to a problem in Home Assistant? Keep me informed if you still have any errors.
@nerdosity Since you filed https://github.com/home-assistant/core/issues/42468 , can you try the testing that ValentinBELYN described above?
Hi.
bash-5.0# python
Python 3.8.5 (default, Sep 10 2020, 14:23:57)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from icmplib import ping
>>>
>>> print(ping('fast.com', count=10).avg_rtt)
19.439
>>> print(ping('fast.com', count=10, interval=0.1).avg_rtt)
18.501
bash-5.0# ping fast.com
PING fast.com (23.1.71.125): 56 data bytes
64 bytes from 23.1.71.125: seq=0 ttl=59 time=6.907 ms
64 bytes from 23.1.71.125: seq=1 ttl=59 time=6.751 ms
64 bytes from 23.1.71.125: seq=2 ttl=59 time=7.149 ms
64 bytes from 23.1.71.125: seq=3 ttl=59 time=6.816 ms
64 bytes from 23.1.71.125: seq=4 ttl=59 time=6.736 ms
64 bytes from 23.1.71.125: seq=5 ttl=59 time=7.415 ms
64 bytes from 23.1.71.125: seq=6 ttl=59 time=7.061 ms
64 bytes from 23.1.71.125: seq=7 ttl=59 time=7.003 ms
64 bytes from 23.1.71.125: seq=8 ttl=59 time=7.236 ms
64 bytes from 23.1.71.125: seq=9 ttl=59 time=7.020 ms
64 bytes from 23.1.71.125: seq=10 ttl=59 time=7.108 ms
64 bytes from 23.1.71.125: seq=11 ttl=59 time=7.055 ms
^C
--- fast.com ping statistics ---
12 packets transmitted, 12 packets received, 0% packet loss
round-trip min/avg/max = 6.736/7.021/7.415 ms
bash-5.0# ping -i 0.1 fast.com
PING fast.com (23.1.71.125): 56 data bytes
64 bytes from 23.1.71.125: seq=0 ttl=59 time=7.058 ms
64 bytes from 23.1.71.125: seq=1 ttl=59 time=6.667 ms
64 bytes from 23.1.71.125: seq=2 ttl=59 time=6.590 ms
64 bytes from 23.1.71.125: seq=3 ttl=59 time=6.824 ms
64 bytes from 23.1.71.125: seq=4 ttl=59 time=6.997 ms
64 bytes from 23.1.71.125: seq=5 ttl=59 time=7.551 ms
64 bytes from 23.1.71.125: seq=6 ttl=59 time=6.918 ms
64 bytes from 23.1.71.125: seq=7 ttl=59 time=25.922 ms
64 bytes from 23.1.71.125: seq=8 ttl=59 time=6.864 ms
64 bytes from 23.1.71.125: seq=9 ttl=59 time=7.050 ms
-----------CUT---------------
64 bytes from 23.1.71.125: seq=64 ttl=59 time=6.715 ms
64 bytes from 23.1.71.125: seq=65 ttl=59 time=6.854 ms
^C
--- fast.com ping statistics ---
66 packets transmitted, 66 packets received, 0% packet loss
round-trip min/avg/max = 6.558/7.405/25.922 ms
We have: 19ms / 18ms in python. 7ms in bash.
And 34ms in HASS. The latter one seems totally off.
Hi @nerdosity 👋
Thank you for taking the time to test and post your results! I am quite surprised of the results you get with my library and the ping command. Can you try to run these two commands again by doing a tcpdump or a Wireshark capture at the same time? The objective is to compare the timestamps and the contents of the ICMP packets.
Thanks in advance 👍
Hi, this morning I did upgrade Home assistant to latest version (0.117), and the bug seems squashed.
It's weird. The tests you performed in Python with icmplib were unrelated to Home Assistant. Maybe Home Assistant has updated your version of icmplib. However, I haven't made a major change recently. If you encounter the same problem later, feel free to update this issue :)
I don't know, I made the tests inside the docker prompt. Now the sensor seems much more stable.
Hi! 👋
@nerdosity Have you encountered this problem again with the latest version of Home Assistant? @bdraco @andriej Can we close this issue? The problem encountered by @andriej does not seem to come from this library.
Not sure if its still an issue. I asked in the Home Assistant issue as well.
I have not seen any other reports
For me is working now.
Great news! Therefore I am closing this issue for the moment. If there is again a problem of this kind, we can open a new one.
Thanks for giving me feedback 👍
recvmsg(3, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("192.168.208.1")}, msg_namelen=128->16, msg_iov=[{iov_base="E\0\0T\220\242\0\0@\1\310\256\300\250\320\1\300\250\320\5\0\0a\251 D\0\1\230\ff_"..., iov_len=192}], msg_iovlen=1, msg_control=[{cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=SCM_TIMESTAMP, cmsg_data={tv_sec=1600523416, tv_usec=512848}}], msg_controllen=32, msg_flags=0}, 0) = 84
It looks like the ping binary looks at the
cmsg_type
and retrieves the timestamp fromcmsg_data
to give a more accurate time.icmplib currently gets time using
time()
which is subject to fluctuations due to cpu load.