firecracker-microvm / firecracker-go-sdk

An SDK in Go for the Firecracker microVM API
Apache License 2.0
482 stars 122 forks source link

A problem with resolving container route appears at an exact commit in the SDK #311

Closed radekg closed 3 years ago

radekg commented 3 years ago

Hi there. Thank you for the work on this SDK, I'm pretty new to Firecracker but I was able to get up to speed in a reasonable time.

I have experienced a problem with certain functionality and a few minutes ago I've managed to identify a commit which makes the functionality not working. It's a weird one, though.

I'm trying to adapt this HashiCorp Nomad plugin: https://github.com/cneira/firecracker-task-driver/tree/master/driver to work with the most recent version of the SDK and I'm using the following CNI network configuration:

{
    "name": "vault",
    "cniVersion": "0.4.0",
    "plugins": [
        {
            "type": "ptp",
            "ipMasq": true,
            "ipam": {
                "type": "host-local",
                "subnet": "192.168.127.0/24",
                "resolvConf": "/etc/resolv.conf"
            }
        },
        {
            "type": "firewall"
        },
        {
            "type": "tc-redirect-tap"
        }
    ]
}

The Nomad plugin, as is in the original repo, with v0.17.x incompatible dependency, works fine. I am able to launch the VMM and can reach it from host via curl (it's a Vault server so I call it at tcp 8200). I have originally naively bumped the go mod dependency to SDK v0.22.0 but after building the plugin, I would immediately get:

$ curl -vvvv http://192.168.127.16:8200
* Rebuilt URL to: http://192.168.127.16:8200/
*   Trying 192.168.127.16...
* TCP_NODELAY set
* Immediate connect fail for 192.168.127.16: Network is unreachable
* Closing connection 0
curl: (7) Couldn't connect to server

I then downgraded the go mod dependency to v0.20.0 and everything was working as with v0.17.x incompatible. Changing dependency to v0.21.0 introduced the problem again. After testing every single commit between v0.20.0 and v0.21.0, I found that:

I pull the commit dependencies via go get github.com/firecracker-microvm/firecracker-go-sdk@<sha>. I can repeatedly do:

go get github.com/firecracker-microvm/firecracker-go-sdk@9e24ecd
# build and deploy, all works fine, I can reach my service
go get github.com/firecracker-microvm/firecracker-go-sdk@2fd80c0
# build and deploy, all works fine, I can't reach my service

When running the not working version, I can still see that the IP address is allocated, the veth pair is created, the route is there, but I simply can't reach the service in my VMM.

I'm trying to understand what happened because the commit where things break does not give anything away. What can I do to further diagnose the issue?

Thank you.

kzys commented 3 years ago

@radekg Sorry for the late reply. Does it work now?

radekg commented 3 years ago

@kzys Yes, it was related to the upstream project not handling the netns setting. Still pretty weird that it was exactly that single commit.