Closed 15367060916 closed 2 years ago
I tried to change the phc2sys command to automatic (- a) mode, and the result is the same as above, and my gPTP.cfg parameter is shown below. sender(master): receiver(slave): The phc2sys commands for both the sender and the receiver are: And my isochron version has been updated to the latest version. @vladimiroltean @roednix @liupoer
The isochron sender and receiver commands need "--transport-specific 1" argument too when using gPTP.cfg.
According to what you said, I added "--transport-specific 1" to isochron sender and receiver commands . but it also failed How to solve this? @vladimiroltean
I'm sorry, I tested just now and there was a bug. Could you please pull the master branch again and try it now?
I'm sorry I didn't try in time yesterday.Just now I updated the latest isochron and tested it again, but there was still an error. The screenshot is as follows: sender: receiver: What's the problem? @vladimiroltean
Is ptp4l started on the receiver?
I could reproduce this bug with the "enp7s0" interface name, but not with shorter names like "eth0" which is what I typically use. Sorry. I've fixed this and pushed a new commit on the master branch. I really appreciate your reports, since my test cases were apparently too limited and did not catch situations like these.
Thank you very much , I updated the latest isochron and tested it again just now , ptp4l and phc2sys synchronization are running normally on both the sender side and the receiver side . Now it seems that monitoring synchronization information has been solved, but isochron still cannot send packets in the case of ptp4l and phc2sys synchronization. (I can't see the sender sending packets on wireshark) the sender report information: In fact, by adding "- o" to the isochron command, the packet can be sent normally, and the sender will also report the sending data information. At the same time, the sender can also capture the sender's packet information on the wireshark. sender: wireshark: But add "-o", sender will not monitor the local and remote ptp4l and phc2sys processes for synchronization status, and will proceed to send test packets regardless. I want to know how to successfully send packets in the case of ptp4l and phc2sys synchronization instead of adding the "- o" parameter. Because I have to carry out TAS-related tests later, I have to carry out the experiment of sending packages in the case of time synchronization. @vladimiroltean
Use the "--sync-threshold" argument on the isochron sender and specify the desired synchronization accuracy. The default is 0 ns - isochron will not settle for anything less than perfect sync :) Also, I notice that the offset between the system clock and the PHC on the sender system isn't great. Are you using phc2sys on both systems? The same "phc2sys -a -rr --transportSpecific 1 --step_threshold 0.0002" command should work in both places.
Thank you very much for your help. I have solved it now. But through further testing, I still have some questions and doubts. 1.Vlan data frame sending and receiving problem: vlan data frame can be sent through the sender, but can not be received at the receiver.Thanks to your help, the sending and receiving of ordinary Ethernet frames has been solved at present.But when I add "--vid 256" to the sender command to send the vlan frame, I grab the packet result through wireshark on the sender side, and look at the output report information of the sender side, we can find that the vlan frame can be sent from the sender side. However, by looking at the report information on the receiver side and running. / isochron report, we can find that all the frames are lost. The screenshot is as follows: By the way ,The command on the receiver side considers "- v" to be an invalid parameter, but the isochron rcv specification document has this parameter. How to solve the problem of sending and receiving vlan frames successfully? @vladimiroltean
2.Issues with unacknowledged timestamps:When I set different cycletime parameters, the smaller parameter value hardware timestamp will be lost, and the larger the parameter value, the less likely the timestamp will be lost. For example, when I set the cycletime value to 100ms and send 1000 data frames, all the hardware timestamps can be obtained successfully. But when I set the cycletime value to 100us and send only 10 data frames, there may be 5 hardware timestamps that may not be available. If I want a smaller packet interval, I may not get the hardware timestamp. Can this problem be solved? Because since you updated the branch, the new "isochron report" function is very powerful, and the most eye-catching thing is that it can obtain the hardware timestamp of the transceiver. If it can support the capture of the hardware timestamp at a lower sending interval, then it will be perfect. @vladimiroltean
3.Because the sender side can specify the time to send by changing the basetime parameter. I noticed that the example in the "isochron send" documentation uses the "tc qdisc" command. What does it do?
4.How to use the taskset function, If only need to add "taskset-c
5..Whether the value of the "- R" parameter on the send and receiver side will affect the synchronization performance used, and whether the higher the value setting, the better. @vladimiroltean Thank you very much for your help.
-v/--vid
is not a valid option for the receiver, just for the sender. I will remove it from the documentation of the receiver.
Maybe the receiver interface has VLAN filtering enabled? In that case you may want to disable it:
# ethtool -k enp7s0 | grep rx-vlan-filter
rx-vlan-filter: on
# ethtool -K enp7s0 rx-vlan-filter off
when I set the cycletime value to 100us and send only 10 data frames, there may be 5 hardware timestamps that may not be available
This is most likely a kernel driver and/or hardware problem (although if it works at lower rates, it's more likely driver related). If I don't have the hardware to reproduce, you may need to report the issue on netdev@vger.kernel.org and get more help from that driver's maintainers.
I noticed that the example in the "isochron send" documentation uses the "tc qdisc" command. What does it do?
See https://man7.org/linux/man-pages/man8/tc-taprio.8.html. Queuing disciplines are the Linux kernel's mechanism for configuring the packet scheduling algorithm on TX. There exists a queuing discipline that describes the 802.1Qbv time aware shaper. With the Linux kernel design, the base implementation is in software, and if a certain function can be offloaded to specialized hardware, it is. With "flags 2", this is exactly what happens: you request the schedule to be offloaded to the NIC. This is useful because isochron will queue a packet to the kernel, but the jitter will still be relatively high, due to variable software processing latencies. But if the MAC is configured for a hardware-based schedule, that jitter is eliminated since a packet is only delivered when its time slot is open. And isochron makes sure to queue a packet to the kernel right before the associated time slot opens. So this is why it's important to specify the base-time both to isochron and to tc-taprio.
If VLAN filtering cannot be disabled, the other option is to create a VLAN interface, and use "isochron rcv" on that:
ip link add link enp7s0 name enp7s0.256 type vlan id 256 && ip link set enp7s0.256 up
The only problem is that isochron currently requires that PTP runs on the same interface as the receiver itself, otherwise it will fail to determine sync quality. So PTP and isochron must be in the same VLAN, practically. If needed, I could probably lift that limitation, though.
As a workaround, if you need VLAN-based tagging, I think you can send using VID 0. The receiver should not drop packets with VID 0 even if VLAN filtering is enabled.
The other workaround may be to just create the VLAN interface using the command mentioned above, but not use it - instead keep using "isochron rcv -i enp7s0". The VLAN interface is just used to add the VID 256 in the RX filter of enp7s0, nothing more.
I don't know why "taskset 0" doesn't fail for you, it fails for me, which is expected because the taskset argument is a CPU mask, so 0 means no CPU:
taskset 0 ls
taskset: failed to set pid 125540's affinity: Invalid argument
The idea with CPU affinity is that for maximum determinism, you should isolate the code that sends packets to its own dedicated CPU. Since isochron now creates a dedicated POSIX thread for the packet sender code, there is a "--cpu-mask" argument for "isochron send", and that is what gets taken into consideration for that sender thread. The "taskset" specifies the CPU affinity of the entire isochron process, but the isochron process performs other background checks as well: synchronization monitoring, printing, TX timestamp collection, etc. We don't want these background tasks to interfere with the real-time sender thread, so we make the isochron process afine to one CPU, and just its sender thread affine to another CPU.
Whether the value of the "- R" parameter on the send and receiver side will affect the synchronization performance used,
The "--num-readings" argument is present in phc2sys too. If you don't need to change the value from its default in phc2sys, you don't need to worry about it in isochron either.
Thank you very much!
I've added support for isochron to work using a VLAN device (like enp7s0.256
) while ptp4l still works on a physical device (like enp7s0
). This should solve the VLAN filtering issues you are seeing. Would you mind testing this?
Thanks!
I've added support for isochron to work using a VLAN device (like
enp7s0.256
) while ptp4l still works on a physical device (likeenp7s0
). This should solve the VLAN filtering issues you are seeing. Would you mind testing this? Thanks!
I updated the latest branch, retested it, and now I have a problem, as shown in the figure:
All the packages can be caught by wireshark at both the receiving end and the sending end. But unable to get the timestamp of the sender. If I send 1000 data frames, 999 timestamps unacknowledged, I send 100 data frames, 99 timestamps unacknowledged. ptp4l and phc2sys synchronization is normal. What is the problem? @vladimiroltean
What kernel version do you use?
sender: receiver: Is it because the 4.15 kernel version is too old? @vladimiroltean
Sorry, I don't know what's wrong. I don't think kernel 4.15 is too old. I thought maybe SOF_TIMESTAMPING_OPT_ID is not supported, but it was added in kernel 3.17. Could you please run again with this exploratory debugging patch? What it does is it ignores the duplicate timestamps, and also prints their type (SCM_TSTAMP_SCHED, SCM_TSTAMP_SND, others).
From be37fc2872f05a99a4995768f41caf380fb1cb46 Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <vladimir.oltean@nxp.com>
Date: Fri, 4 Mar 2022 15:07:52 +0200
Subject: [PATCH] isochron: debug duplicate TX timestamps
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
isochron/send.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/isochron/send.c b/isochron/send.c
index a46ff7ca9b56..4c1b18b98b5c 100644
--- a/isochron/send.c
+++ b/isochron/send.c
@@ -361,9 +361,9 @@ static int prog_poll_txtstamps(struct prog_data *prog, int timeout)
if (isochron_pkt_fully_timestamped(send_pkt)) {
fprintf(stderr,
- "received duplicate timestamp for packet key %u already fully timestamped\n",
- tstamp.tskey);
- return -EINVAL;
+ "received duplicate timestamp type %u for packet key %u already fully timestamped\n",
+ tstamp.tstype, tstamp.tskey);
+ return 0;
}
swts = __cpu_to_be64(utc_to_tai(timespec_to_ns(&tstamp.sw),
--
2.25.1
I changed the version of kernel 5.4 to test the server again, and the following error occurred: Why is this? everything is fine between ptp synchronization and phc2sys.@vladimiroltean
Can you ping 192.168.1.20? Is the isochron receiver started there?
sorry,maybe it‘s the problem . I will try tomorrow.It's already 1: 00 in the morning in our local area. Thank you.
Thanks for your help. Now I have completed the directly connected packet test on two test hosts, and there is no problem of discarding VLAN frames. Next I will do a 802.1Qbv test on NXP LS1028A-RDB or NXP LS1021A-RDB. Prior to this, I conducted several release tests on directly connected hosts, and there were two main problems, which I thought would prevent me from doing 802.1Qbv testing. The detailed problem description is as follows:
Isochron has a large jitter when sending packets, with a maximum jitter of more than 100 us, and an average jitter of tens of microseconds. The following chart data is drawn according to the hardware timestamp of the isochron.csv sender output by isochron report during my actual measurement. Carry out the experiment of sending packets under the condition of cycletime = 100ms, 10ms , 1ms, 100us: (1)Cycletime = 100us number=100 Maximum value=170.398 us Minimum value=29.592 us Maximum jitter=70.4085046 us Average jitter= 3.101965372 us (2)Cycletime = 1ms number=1000 Maximum value= 1117.875 us Minimum value= 880.603 us Maximum jitter= 119.436699 us Average jitter= 23.99408454 us (3)Cycletime = 10ms number=100 Maximum value= 10.188927 ms Minimum value= 9.807108 ms Maximum jitter= 194.46309 us Average jitter= 57.07146556 us (4)Cycletime = 100ms number=100 Maximum value= 100.215969 ms Minimum value= 99.757763 ms Maximum jitter= 242.2808 us Average jitter= 44.1196 us Overall, the average jitter is tens of microseconds, and even the maximum jitter is more than 200 microseconds. This is not enough for the more precise time granularity required for Qbv testing. May I ask how much isochron can theoretically achieve packet jitter? if the packet jitter tested by yourself is usually much smaller than mine, what hardware equipment do you experiment with, and how can I solve this problem? The Ethernet card I used in my experiment is Intel's I210 Ethernet card.
The problem of hardware timestamp being unacknowledged is more serious than expected. This is especially shown when the cycletime value is set below ms. For example, if I set cycletime=100us, number=10, there will be 4 or 6 TX unacknowledged; when I set cycletime=100us, number=100, more than 10 TX will be unacknowledged; if I set cycletime=100us, number=1000, dozens to hundreds of TX will be unacknowledged. The more packets I send, the worse the situation. If the cycletime value is set above ms, the timestamp acquisition is better, and the timestamp is generally lost in single digits. If the Qbv test is carried out, too much timestamp loss will also affect the delay analysis.
3.My current experimental situation should be able to meet the time accurate granularity of ms level Qbv basic test, but can not complete the us level time accuracy test. In the PDF file of openil 1.11, the isochron sender will also output the summarry report information, which is in the red wireframe below. But now the isochron has no summarry report information. Did you delete it in the newly updated branch? Whether it is more convenient to use the open source project openil-community for Qbv testing, I noticed that the README file can output delay report images. @vladimiroltean
chrt --fifo 90
--cpu-mask 0x8
, assuming you want to reserve CPU 0 for the isochron sender--sched-fifo --sched-priority 90
to alter the scheduling parameters only for the sender threadNow that isochron has the correct arguments, you will probably need to experiment with the configuration of a PREEMPT_RT kernel. Here are some rough indications for how to do this, assuming you have a 4-core CPU and you want to reserve CPU 3:
Make sure the kernel is compiled with CONFIG_HIGH_RES_TIMERS=y
.
Set a constant CPU frequency/disable dynamic frequency scaling. It's best to compile the kernel with CONFIG_CPU_FREQ=n
, although the governor should also be configurable at runtime.
$ cat /sys/devices/system/cpu/cpu3/cpufreq/scaling_available_governors
conservative powersave ondemand userspace performance schedutil
$ cat /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor
ondemand
$ echo -n performance > /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor
$ cat /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor
performance
Or:
$ for cpu in cpu0 cpu1 cpu2 cpu3; do cat /sys/bus/cpu/devices/${cpu}/cpufreq/scaling_max_freq > /sys/bus/cpu/devices/${cpu}/cpufreq/scaling_min_freq; done
In either case, to check that the CPUs are running constantly at max frequency:
# reset stat counters
$ echo 1 > /sys/devices/system/cpu/cpu3/cpufreq/stats/reset
$ apt install cpufrequtils
# check CPU1 frequency, always 1.8 GHz
$ cpufreq-info | grep stats
cpufreq stats: 1.80 GHz:100.00%, 1.40 GHz:0.00%, 900 MHz:0.00%, 700 MHz:0.00%
cpufreq stats: 1.80 GHz:100.00%, 1.40 GHz:0.00%, 900 MHz:0.00%, 700 MHz:0.00%
cpufreq stats: 1.80 GHz:100.00%, 1.40 GHz:0.00%, 900 MHz:0.00%, 700 MHz:0.00%
cpufreq stats: 1.80 GHz:100.00%, 1.40 GHz:0.00%, 900 MHz:0.00%, 700 MHz:0.00%
Enable CONFIG_NO_HZ_FULL=y
to get a chance to configure some CPUs as fully tickless (no scheduler timer interrupt). The CPU which will be tickless is the CPU reserved for isochron.
Add nohz_full=3
to the kernel command line (the one that can be seen in cat /proc/cmdline
). This will actually make CPU 3 tickless.
Add isolcpus=3
to the kernel command line. This will prevent the scheduler from automatically scheduling any process to CPU 3.
For some more extreme fun, you may need to defer some extra kernel threads from CPU 3. Grossly oversimplifying, RCU (Read Copy Update) is a kernel internal mechanism for keeping shadow copies of variables, and it has some garbage collector kernel threads. The rcu_nocbs
argument says "move the per-cpu RCU callbacks of the CPUs in this list to some dedicated kernel threads, called rcuox/N
". These kernel threads will then be scheduled by the scheduler on the cores which are available for scheduling, i.e. the ones which are not in isolcpus. Otherwise said, with rcu_nocbs=3 isolcpus=3
in the kernel command line, the effect is that CPU 3 will have its RCU callback kernel threads rcuop/3, rcuog/3, all running on CPUs 0-2. Which is good because we have offloaded some more work from CPU 3.
With this basic configuration in place, you should be able to also enable CONFIG_FTRACE=y
in the kernel and do more in-depth tuning.
The basic strategy is to run isochron using a command like this:
$ trace-cmd record -e irq -e net -e syscalls -e sched isochron send --tracemark ...
# this will generate a trace.dat file
$ kernelshark # this will open the trace.dat file in a GUI
In kernelshark you can see what else delays the isochron process on CPU 3. If there is any other kernel thread there which I haven't mentioned (like for example cpuhp/3
for hotplugging), you can move it away using taskset
.
In addition to isochron
testing, you can/should also test using the cyclictest
program. There are tons of options to this program as well, so I won't go into details.
I am not an expert on PREEMPT_RT or kernel configuration, these are just some pointers which I've found helpful for my own testing.
Some additional resources: https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html https://www.kernel.org/doc/Documentation/kernel-per-CPU-kthreads.txt https://lemariva.com/blog/2019/09/raspberry-pi-4b-preempt-rt-kernel-419y-performance-test https://github.com/hsgwa/ros2_timer_latency_measurement/blob/master/setup.md
./scripts/get_maintainer.pl drivers/net/ethernet/intel/igb/e1000_i210.c
Jesse Brandeburg <jesse.brandeburg@intel.com> (supporter:INTEL ETHERNET DRIVERS)
Tony Nguyen <anthony.l.nguyen@intel.com> (supporter:INTEL ETHERNET DRIVERS)
"David S. Miller" <davem@davemloft.net> (maintainer:NETWORKING DRIVERS)
Jakub Kicinski <kuba@kernel.org> (maintainer:NETWORKING DRIVERS)
intel-wired-lan@lists.osuosl.org (moderated list:INTEL ETHERNET DRIVERS)
netdev@vger.kernel.org (open list:NETWORKING DRIVERS)
linux-kernel@vger.kernel.org (open list)
3.
the isochron sender will also output the summarry report information, which is in the red wireframe below. But now the isochron has no summarry report information. Did you delete it in the newly updated branch?
You now need to run isochron report --summary
to see it.
Whether it is more convenient to use the open source project openil-community for Qbv testing, I noticed that the README file can output delay report images.
I haven't updated the openil-community repository in a while. It is likely that whatever operating system you use, you will need to do some tuning. I don't necessarily suggest that you use a particular kernel or rootfs.
Thank you very much,I'll try your suggestion to do the experiment.
---- 回复的原邮件 ---- | 发件人 | Vladimir @.> | | 日期 | 2022年03月08日 21:23 | | 收件人 | @.> | | 抄送至 | @.**@.> | | 主题 | Re: [vladimiroltean/tsn-scripts] failed to check sync status :interrupted system call (Issue #8) |
the isochron sender will also output the summarry report information, which is in the red wireframe below. But now the isochron has no summarry report information. Did you delete it in the newly updated branch?
You now need to run isochron report --summary to see it.
Whether it is more convenient to use the open source project openil-community for Qbv testing, I noticed that the README file can output delay report images.
I haven't updated the openil-community repository in a while. It is likely that whatever operating system you use, you will need to do some tuning. I don't necessarily suggest that you use a particular kernel or rootfs.
— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>
I'd like to ask about tc-qdisc. Whether the Linux kernel can directly recognize 802.1q vlan priority when using the tc-qdisc command. (It should not be feasible.) In the user's manual, the packet filtering priority=4 can be directly identified, and since the --vid parameter is not specified, --priority 4 represents SO_PRIORITY(4)instead vlan pcp 4 at this time. If I add the-- vid parameter, the tc-qdisc command fails because the priority is the priority of the 802.1q standard rather than SO_PRIORITY. Is that right for me to understand? Because if the tc-qdisc command can be identified based on the vlan priority in 802.1q, then this hardware method can theoretically be used to eliminate jitter when sending vlan packets. @vladimiroltean
I'm sorry, I don't understand what you're asking.
The --priority
argument of isochron is used for the SO_PRIORITY
API. Inside the kernel, this translates into the skb->priority
field (a socket buffer is the data structure associated with a packet). If --vid
is used, then the priority is also used to populate the VLAN PCP field. But the kernel doesn't populate skb->priority
based on VLAN PCP on transmission, unless it is explicitly told to do so.
A way to do that would be to create and use a VLAN interface using
ip link add link eth0 name eth0.100 type vlan id 100 ingress-qos-map 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7 egress-qos-map 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
Another way would be to not create a VLAN interface, but apply prioritization filters based on VLAN PCP on the egress of the base interface:
tc qdisc add dev eth0 clsact
tc filter add dev eth0 egress protocol 802.1Q flower cvlan_prio 7 action skbedit priority 7
With the latter being more flexible, because you can do QoS classification based on any packet header fields using tc-flower, not just VLAN PCP.
But none of this matters, because isochron sets the packet priority using SO_PRIORITY and based on this, the packet goes through the priority to traffic class map
of tc-taprio
to select a traffic class. Then the gate applies to the translated traffic class.
You said that the tc-qdisc command fails, can you show how it fails?
Sorry, I don't know what's wrong. I don't think kernel 4.15 is too old. I thought maybe SOF_TIMESTAMPING_OPT_ID is not supported, but it was added in kernel 3.17. Could you please run again with this exploratory debugging patch? What it does is it ignores the duplicate timestamps, and also prints their type (SCM_TSTAMP_SCHED, SCM_TSTAMP_SND, others).
I'd like to give an update on this. It turns out that kernels 4.19 and earlier do indeed have buggy support for SOF_TIMESTAMPING_OPT_ID when used for PF_PACKET sockets, like isochron uses. https://github.com/vladimiroltean/tsn-scripts/issues/11#issuecomment-1078442611
The SOF_TIMESTAMPING_OPT_ID support was fixed in the latest stable 4.14 and 4.19 versions, so I am now closing this ticket: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-4.14.y&id=add668be8f5e53f4471a075edaa70a7cb85fd036 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-4.14.y&id=a96c57a72f477b42ab238fad3c2c1f8e8c091256 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-4.19.y&id=cd7295d0bea3f56a3f024f1b22d50a0f3fc727f1
Hi,I am very happy to use isochoron. I was able to use it a few months ago, but I encountered some problems this time.My experimental topology is a simple direct connection between two servers, and enable gPTP time synchronization and phc2sys synchronization in the receiver and sender. Synchronization commands and report log information are as follows. send: receiver: I think synchronization should be no problem. But when I enabled the isochron command, I encountered some errors that prevented the data frame from being sent successfully, as follows: send: receiver: It seems that the problem is that the request to detect synchronization information failed. How should I solve it?thank you very much . By the way ,My ubuntu system and kernel are 20.04 LTS 5.11.0-41-generic and 18.04 LTS 5.4.0-91-generic, respectively. @vladimiroltean @roednix @liupoer