dioptra-io / caracal

:cat: A fast ICMP/UDP IPv4/v6 Paris traceroute and ping engine.
https://dioptra-io.github.io/caracal/
MIT License
26 stars 4 forks source link

Other pings can appear in the results #52

Open mpiraux opened 1 year ago

mpiraux commented 1 year ago

Hello, I've been running caracal on a machine that had a RIPE Atlas software probe running and found the RIPE pings to appear in the output csv. I guess the tools simply logs what is exchanged on the captured interface.

maxmouchet commented 1 year ago

Hi,

It does capture all incoming ICMP, however the integrity check feature should drop ICMP replies not matching probes sent by caracal.

Can you share the command line you've been using to run caracal, as well as the kind of measurements (protocol, IPv4 or IPv6)?

mpiraux commented 1 year ago

Here is a CSV from one of these measurements, google.com.csv. It's a series of ping towards google.com, resolved in IPv4 and IPv6. In the log, you can see destination IP 2001:67c:2e8:3::c100:a4 appear, which is not the Google server (2a00:1450:400e:803::200) but a RIPE anchor to which the RIPE probe software running on the machine was sending pings. There are 20 or so occurence of this in the file.

I'm using pycaracal and use something very similar to the examples. Given the seeded PRNG I can generate again the probe specifications I fed to caracal.

#!/usr/bin/env python3

import sys, os, socket
from pycaracal import Probe, prober
import random

if len(sys.argv) < 3:
    print(f"Usage: {sys.argv[0]} hostname csv_output")
    print(-1)

PORT_LO = 50000
PORT_HI = 60000
RAYS = 32
PACKET_PER_PROBE = 4
PROBE_PER_RAY = 50 // PACKET_PER_PROBE
TTL = 127
random.seed("caracal_rays.py")

hostname = sys.argv[1]
try:
    v4_addr = socket.getaddrinfo(hostname, None, family=socket.AF_INET, proto=socket.SOCK_RAW)[0]
    v6_addr = socket.getaddrinfo(hostname, None, family=socket.AF_INET6, proto=socket.SOCK_RAW)[0]
except IndexError:
    print(f"Domain {hostname} does not resolve to both families")

srcports = [random.randrange(PORT_LO, PORT_HI) for _ in range(RAYS)]

probes = [Probe(v4_addr[4][0], srcport, PORT_HI, TTL, "icmp") for _ in range(PROBE_PER_RAY) for srcport in srcports] + \
         [Probe(v6_addr[4][0], srcport, PORT_HI, TTL, "icmp6") for _ in range(PROBE_PER_RAY) for srcport in srcports]

random.shuffle(probes)

config = prober.Config()
config.set_n_packets(PACKET_PER_PROBE)
config.set_output_file_csv(sys.argv[2])
config.set_sniffer_wait_time(10)
config.set_probing_rate(4)
config.set_batch_size(1)
print(prober.probe(config, probes))
maxmouchet commented 1 year ago

Ok thanks, I see! The issue is that we do not validate echo replies and IPv6 replies.

The way validation (or integrity checking) currently works is:

  1. Generate a random identifier (caracal_id) on start (can also be specified with --caracal-id)
  2. For each probe, compute the caracal checksum as the IP checksum of (caracal_id, ipv4_last_byte, flow_id, ttl)
  3. Encode this checksum in the IPv4 ID field and send the probe
  4. When we get a time exceeded reply, extract the checksum and ipv4_last_byte, flow_id and ttl from the quoted IP packet, and compare it with the expected checksum.
  5. If the checksum doesn't match, discard the reply.

This doesn't work for ICMP Echo Replies. Since the original probe packet is not quoted, we cannot retrieve the checksum field (IP ID). The same issue applies for IPv6 where there is no ID field in the IP header.

This wasn't really an issue for us since we only cared about routers, and not replies from the destination.

I haven't given it more thought, but maybe the checksum can be encoded in the ICMP ID field, which we should be able to retrieve from the Echo Reply. Since caracal embed all its state in the probe packet, we're abusing every header field and there's not much room left for additional information.

A workaround is (obviously :-)) to probe from a machine with no other ping processes.

Probe checksum in https://github.com/dioptra-io/caracal/blob/main/src/probe.cpp:

uint16_t Probe::checksum(uint32_t caracal_id) const noexcept {
  // TODO: IPv6 support? Or just encode the last 32 bits for IPv6?
  return Checksum::caracal_checksum(caracal_id, dst_addr.s6_addr32[3], src_port,
                                    ttl);
}

Reply checksum in https://github.com/dioptra-io/caracal/blob/main/src/reply.cpp:

uint16_t Reply::checksum(uint32_t caracal_id) const {
  // TODO: IPv6 support? Or just encode the last 32 bits for IPv6?
  return Checksum::caracal_checksum(caracal_id, probe_dst_addr.s6_addr32[3],
                                    probe_src_port, probe_ttl);
}

bool Reply::is_valid(uint32_t caracal_id) const {
  // Currently, we only validate IPv4 ICMP time exceeded and destination
  // unreachable messages. We cannot validate echo replies as they do not
  // contain the probe_id field contained in the source IP header.
  // TODO: IPv6 support?
  if (reply_protocol == IPPROTO_ICMP &&
      (reply_icmp_type == 3 || reply_icmp_type == 11)) {
    return probe_id == checksum(caracal_id);
  }
  return true;
}
SaiedKazemi commented 11 months ago

@mpiraux @maxmouchet Are there any updates to this issue since February? Can we close it?