go-ping / ping

ICMP Ping library for Go
MIT License
1.31k stars 344 forks source link

Add support for handling lost packets #206

Closed FlorianLoch closed 1 year ago

FlorianLoch commented 2 years ago

This MR adds functionality allowing to check whether ping packets are answered (in time). A regularly run check removes packets from the set of expected replies in case the timeout has been reached, increasing a PacketsLost counter and invoking an OnLost handler.

So far lost packets are simply computed by taking the delta of sent and received packets - which actually is not very precise. E.g., computing this delta immediately after sending a packet will lead to a loss of > 0, although the packet should probably just be considered still awaited instead of lost. Additionally, it was not possible to act based on the event of a package being (potentially) lost (by now being answered in time).

In order to implement this, the central awaitingSequences map has been flattened and now maps to a timestamp indicating the point in time a packet has been sent. Flattening of the map brings the additional benefit of avoiding a continuous, monotonic growth of the map (which AFAICT happens so far because former UUID keys are not removed from the root map). Additionally, that allowed removing the (also monotonically growing) list of previously used UUIDs. IMHO this simplifies the codebase quite a bit as some edge cases are avoided.

A test covering the added functionality has been added, some small cleanups have been done aside (all isolated in their own commit) and the example ping app has been updated to showcase the new feature.

I will be glad to discuss these changes with you and adjust the MR accordingly if required.

SuperQ commented 2 years ago

Yea, there are several ways to implement this, and several open PRs.

I was actually starting work to replace the awaitingSequences with a custom tracker class. Basically instead of having an inline map, have something like this:

-               if _, inflight := p.awaitingSequences[*pktUUID][pkt.Seq]; !inflight {
+               if !p.tracker.HasPacket(*pktUUID, pkt.Seq) {

I'm still working on it, and it doesn't work yet, but here's a gist of what I've done so far.

https://gist.github.com/SuperQ/6a74834ff4c4855f1446c2ffeb117c84

FlorianLoch commented 2 years ago

I have to admit, I thought I would be easier to just start off again from current master than starting from one of the other MRs, basically hijacking it. Not necessarily claiming, that this is the superior approach ;)

I also admit it's perhaps not the best approach to open another MR for a topic that is already tackled by others, but I really needed this functionally to be added in order to use it in another project.

Regarding your gist: I like the idea of moving the logic to it's own entity, your custom type. But I am not sure, this is actually necessary in this case. Having a timer for every packet send might lead to quite some goroutines idling around - but that should not be an issue I think. But it adds quite some complexity, which you suggest to handle with locking, which of course is fine. The solution suggested in this MR instead runs all operations accessing the map in the same goroutine, therefore not requiring any synchronisation. The downside is, the check's precision depends on the interval it's run - which I considered to be good enough for my use case. But I might be not good enough in others. ;) Though, the interval could be reduced in these - also it would never get to the precision your approach is capable of.

Additionally, I really suggest flattening the map. It actually makes things easier, requires less allocations and avoids growing data structures more easily.

FlorianLoch commented 2 years ago

Any suggestions how to proceed with this? I am not saying it has to be this approach. In general, the fact that there are multiple MRs for this functionality kinda shows the demand for it ;)

SuperQ commented 2 years ago

I think the main issue is that the timer is not going to work as intended.

If you have a 1s interval and a 5s timeout, it's only going to check for lost packets every 5s. So in practice, it will not accurately check for packet loss at the per-packet timeout requested.

FlorianLoch commented 2 years ago

If you have a 1s interval and a 5s timeout, it's only going to check for lost packets every 5s. So in practice, it will not accurately check for packet loss at the per-packet timeout requested.

I checked my code again as it has been quite a while since I pushed it and I would like to clarify two things:

  1. There is timeout describing how long the pinger will actually perform its periodic pings and there is packageTimeout which describes the duration after which a sent-out ping is considered lost. I guess you refer to the latter one.
  2. If the per-package timeout is set to 5s (by setting packageTimeout to 5s) and the interval is, e.g., 1s then it can happen that it takes up to 10s (as it can always take up to 2 * packageTimeout) until a lost package will be accounted lost - as described by you.

I agree, that this is not ideal - it's an "intended" trade-off between precision detecting lost packages and too frequent checking without introducing another parameter to configure the interval or even having a separate timer for each packet.

Thinking about it, I guess a possible mitigation could be to use checkInterval = min(packageTimeout, interval) as lost-package-checking-interval? That would lower the error in the scenario sketched by you.

Still, it might still take up to 2 * checkInterval until the packet is considered lost. Having my context and the set defaults in mind I considered this a fair compromise - but I admit it might not suite all scenarios - especially ones with packageTimeout being significantly higher than interval (although I am not sure how common a ping timeout of several seconds actually is). So adding an explicit parameter defaulting to above checkInterval might be a good way to go? The only real alternative I see would be having an array of timers.

SuperQ commented 1 year ago

See https://github.com/go-ping/ping/pull/226