Feature: Use tcp for traceroute

niklastreml commented 4 months ago

Is there an existing feature request for this?

[X] I have searched the existing issues

Problem Description

The current implementation of the traceroute check uses udp to perform its logic. This is not ideal since udp is not connection oriented. This means that the check can never really know whether a packet reached its destination, unless that destination sends back an ICMP packet, which tends to be blocked by firewalls.

Solution Description

We can get around the above issues by reimplementing the traceroute functionality on top of TCP instead of UDP. We can essentially send a TCP SYN packet to initiate a handshake with a server on its open port. The underlying logic of the check stays the same, we're only changing the protocol.

Who can address the issue?

Anyone who wants to read some RFCs and feels like playing around with berkeley sockets

Additional Context

https://en.wikipedia.org/wiki/Berkeley_sockets https://www.ietf.org/rfc/rfc793.txt#:~:text=Transmission%20Control%20Protocol-,3.%20%20FUNCTIONAL%20SPECIFICATION,-3.1.%20%20Header%20Format https://github.com/mct/tcptraceroute/blob/master/probe.c#L80

lvlcn-t commented 1 month ago

I've started to implement this in feat/tcp-traceroute. Unfortunately, we cannot get any useful information with a pure tcp traceroute because of the nature of the tcp protocol. We can only know how many hops we have done without a successful connection (a response in form of a RST packet) but any further information is not possible because the routers in between only send back an ICMP packet if the ttl has expired (traceroute uses this to get the hops). I also implemented the other forms of traceroute in the branch (udp/icmp & raw sockets/icmp) to allow the user to choose the method they want to use but as you may know both of these methods either need the CAP_NET_RAW capability or root privileges which is not ideal. I will continue to work on this but I wanted to share my progress so far. Additionally I'm planning to adjust the user configuration options as follows:

traceroute:
  protocol: tcp
  interval: 10s
  timeout: 400s
  maxHops: 64
  retry:
    count: 5
    delay: 1s
  targets:
    - addr: example.com
      port: 80
    - addr: google.com
      port: 80

This way the user can choose the protocol they want to use. I'm also planning to adjust the API response to the following schema:

// result represents the result of a single hop in the traceroute
type result struct {
    // Duration represents the total duration of the traceroute
    Duration float64
    // Hops represents the hops to the target
    Hops []traceroute.Hop
}

// Hop represents the result of a single hop in the traceroute
type Hop struct {
    // Tracepoint represents the hop number
    Tracepoint int
    // IP represents the IP address of the hop
    IP net.IP
    // Error represents the error that occurred during the hop
    Error string
    // Duration represents the time it took to reach the hop
    Duration float64
    // ReachedTarget indicates whether the target was reached with this hop
    ReachedTarget bool
}

I also tried to implement as much concurrency as possible to speed up the traceroute process as it takes a long time to complete. I will continue to work on it and update you on my progress.

niklastreml commented 1 month ago

Sounds good so far. I've done a fair bit of research on this and there just isn't a way to get useful data without capabilities. We should just let the user choose, if they need that extra info or not.

To actually get the data from icmp, you'll probably need to parse the time exceeded response in some way. Since ICMP doesn't know about ports, that response includes 64 bits of the previous packet's payload. This is just about enough to fit in the tcp source and destination ports and the sequence number. If you can get your hands on the source port you should be able to map the icmp response back to your correct request

caas-team / sparrow