Open SPYFF opened 2 years ago
For example the following command by default gives almost 50% hw timestamp loss.
isochron send -i enp3s0 -s 64 -c 0.00005 --client 10.0.0.20 --num-frames 1000 -F isochron.dat --sync-threshold 2000
If I increase the rcvbuf of the data socket (where we have the error queue too) this reduced to 5-10 missing timstamp. However no matter what I do, I cannot reduce the missing timestamps to zero.
Ok I think I found the problem: basically I too dumb for linux. Sorry for the noise, it was also an igc
specific problem I guess.
With bpftrace
I found if the isochron
and the igc's PTP tx tstamp worker (igc_ptp_tx_work
) scheduled into different CPU cores, I lost some/many timestamps. With large cycle times it was not an issue, but with smaller cycle times it is. Running the measurement with --cpu-mask=$core_of_igc_kworker
I got much more HW TX timestamp (1000 in my case, running it with larger numbers like 5000 or more however still not ok) without lose a single one.
If you can have some hints or tweaks to minimize the chance of the lost HW timestamps I would be glad to hear it, but I think this safe to close because not related to isochron.
Hi Ferenc, Sorry for the late response, I just came back from vacation. It sounds like the problem is caused at least partially by the igc driver's inability to perform TX timestamping for more than 1 packet at a time: https://elixir.bootlin.com/linux/v5.18.11/source/drivers/net/ethernet/intel/igc/igc_main.c#L1451 I see there's a "tx_hwtstamp_skipped" ethtool -S counter, could you check if that is what is incrementing? I sadly don't have the necessary hardware for this.
Hi!
Thanks for the help, tx_hwtstamp_skipped
counter indeed incrementing (but only when I run isochron on the same core as the igc kworker). Do you think it might worth to mention this issue on the netdev list for the Intel devs or you see some quick fix what I can apply here?
Sorry again for the delay, I don't see a quick fix, I think it's a problem if the driver decides to drop TX timestamping requests willy-nilly, and it should be reported on the netdev list and see what can be done. At the very least, the driver could queue the packet until the current one is no longer being timestamped. This could be done by anyone with the hardware, since it's just some extra logic, not so much adding support for a different set of timer registers.
Vinicius replied on netdev to the issue with a patchset using all four registers for timestamping.
Vinicius replied on netdev to the issue with a patchset using all four registers for timestamping.
And does it solve the problem?
I compiled the kernel but the testbed is occupied for a while, so I havent had the opportunity to test it, but I'll be back soon.
Hi! Just a generic question, not necessarily to the isochron. When I run the software with smaller and smaller cycle times, like two or one digit usec, more and more missing timestamps reported. Is there something I can tune or this is an expected behavior? Is it possible to increase the buffersize of the MSG_ERRQUEUE or thats irrelevant for this problem? I use Intel i225 (igc driver) by the way.
The isochron report shows strange zero timestamps for those seqids:
For larger cycle times, I get all of the timestamps (rx: sw, hw, tx: sw, hw)