RIPE-NCC / ripe-atlas-probe-measurements

RIPE Atlas probe measurement source code
Other
41 stars 18 forks source link

Undocumented error field in traceroute hop reply #14

Open jmeggitt opened 1 year ago

jmeggitt commented 1 year ago

Issue

RIPE Atlas probes will occasionally emit hop replies containing an "error" field (Appearing in 0.019% of traceroute measurements I checked). This behavior is not documented for any firmware version on https://atlas.ripe.net/docs/apis/result-format/. Upon some investigation, there are a few places in the measurement code where this field can be produced. According to git blame, this has and continues to be part of the probe behavior for at least 10 years now.

At the moment I am somewhat unsure if this is an issue with the documentation or probe code. I can think of a few arguments for each side.

Effected Measurement Examples

I dumped every measurement from one of the RIPE Atlas hourly traceroute dump files (traceroute-2022-10-14T0400.bz2) that contained this field. This data is stored as newline delimited JSON and can be found in this gist. This appears to be extremely rare and only occurred in 1690 of the 8,912,306 traceroute measurements from that data file (0.019%).

Examples

Here are a couple measurements I arbitrarily chose to show off what they look like in the context of the data.

{
  "af": 6,
  "dst_addr": "2001:500:2::c",
  "dst_name": "2001:500:2::c",
  "endtime": 1665721544,
  "from": "2001:bc8:62c:2545::1",
  "fw": 5020,
  "lts": 11,
  "msm_id": 6011,
  "msm_name": "Traceroute",
  "mver": "2.2.0",
  "paris_id": 2,
  "prb_id": 1000410,
  "proto": "UDP",
  "result": [
    {
      "hop": 1,
      "result": [
        {
          "error": "sendto failed: Operation not permitted"
        }
      ]
    }
  ],
  "size": 40,
  "src_addr": "2001:bc8:62c:2545::1",
  "timestamp": 1665721544,
  "type": "traceroute"
}
{
  "af": 4,
  "dst_addr": "46.101.130.201",
  "dst_name": "46.101.130.201",
  "endtime": 1665722912,
  "from": "170.39.226.151",
  "fw": 5040,
  "group_id": 29556742,
  "lts": 1,
  "msm_id": 29556742,
  "msm_name": "Traceroute",
  "mver": "2.4.1",
  "paris_id": 14,
  "prb_id": 6927,
  "proto": "ICMP",
  "result": [
    {"hop": 1,"result": [{"x": "*"},{"x": "*"},{"x": "*"}]},
    {
      "hop": 2,
      "result": [
        {
          "error": "sendto failed: Network is unreachable"
        }
      ]
    }
  ],
  "size": 48,
  "src_addr": "170.39.226.151",
  "timestamp": 1665722900,
  "type": "traceroute"
}
{
  "af": 6,
  "dst_addr": "2a00:74c0:0:2::20",
  "dst_name": "2a00:74c0:0:2::20",
  "endtime": 1665722173,
  "from": "2a05:f6c7:3853:0:eade:27ff:fe69:dd4e",
  "fw": 5070,
  "group_id": 25639804,
  "lts": 38,
  "msm_id": 25639804,
  "msm_name": "Traceroute",
  "mver": "2.6.1",
  "paris_id": 7,
  "prb_id": 22203,
  "proto": "ICMP",
  "result": [
    {"hop":1,"result":[{"from":"2a05:f6c7:3853:0:1e74:dff:fec3:e2f8","rtt":0.948,"size":96,"ttl":255},{"from":"2a05:f6c7:3853:0:1e74:dff:fec3:e2f8","rtt":0.828,"size":96,"ttl":255},{"from":"2a05:f6c7:3853:0:1e74:dff:fec3:e2f8","rtt":0.759,"size":96,"ttl":255}]},
    {"hop":2,"result":[{"from":"2a05:f6c0:1::18","rtt":3.658,"size":96,"ttl":63},{"from":"2a05:f6c0:1::18","rtt":4.791,"size":96,"ttl":63},{"from":"2a05:f6c0:1::18","rtt":3.415,"size":96,"ttl":63}]},
    {"hop":3,"result":[{"from":"2a05:f6c0:2:23::1","rtt":5.209,"size":96,"ttl":62},{"from":"2a05:f6c0:2:23::1","rtt":4.848,"size":96,"ttl":62},{"from":"2a05:f6c0:2:23::1","rtt":5.137,"size":96,"ttl":62}]},
    {"hop":4,"result":[{"from":"2001:6c8:81:100::1a1","rtt":4.867,"size":96,"ttl":252},{"from":"2001:6c8:81:100::1a1","rtt":18.54,"size":96,"ttl":252},{"from":"2001:6c8:81:100::1a1","rtt":4.106,"size":96,"ttl":252}]}, 
    {"hop":5,"result":[{"from":"2001:6c8:40::1e","rtt":4.761,"size":96,"ttl":250},{"from":"2001:6c8:40::1e","rtt":4.698,"size":96,"ttl":250},{"from":"2001:6c8:40::1e","rtt":5.275,"size":96,"ttl":250}]},
    {
      "hop": 6,
      "result": [
        {
          "x": "*"
        },
        {
          "error": "sendto failed: Network is unreachable"
        }
      ]
    }
  ],
  "size": 48,
  "src_addr": "2a05:f6c7:3853:0:eade:27ff:fe69:dd4e",
  "timestamp": 1665722169,
  "type": "traceroute"
}

Values

As far as I have seen, this field is always a string when it appears. Here are the number of occurrences for all the different values I found when dumping the effected measurements.

count   value
      9 "bind failed: Address already in use"
     11 "bind failed: Address not available"
     47 "bind failed: Cannot assign requested address"
    334 "bind failed: Invalid argument"
    804 "sendto failed: Network is unreachable"
    104 "sendto failed: Network unreachable"
    364 "sendto failed: Operation not permitted"
     17 "sendto failed: Permission denied"

Tests

I found this field when I was in the process of writing some code to parse measurements in Rust with serde. To verify that I had written my tests correctly I had it attempt to parse every measurement in one of the hourly data dumps of traceroute measurements. Thanks to Rust and serde, this code is able to detect and verify the following aspects when parsing the input data. However the error field was the only* notable inconsistency it found with the API documentation.

* Excluding the bug where empty objects could be emitted alongside traceroute hop replies. ** When making error and result mutually exclusive in traceroute hops, I decided to give a pass to hop sometimes not appearing during an error.