Closed Jojonintendo closed 5 months ago
in the meanwhile, confirming the problem exists in the latest version (or even main branch) would be helpful.
cc @sbrivio-rh @dgibson
in the meanwhile, confirming the problem exists in the latest version (or even main branch) would be helpful.
Generally a good idea, thanks :) but there's no need here, 2024_05_23.765eb0b-1 is already the latest version, and the difference between current HEAD
are four commits that have nothing to do with this issue.
--net=pasta:--ipv4-only,-a,10.0.2.100,-n,24,-g,10.0.2.2,--dns-forward,10.0.2.3,-m,1500,--no-ndp,--no-dhcpv6,--no-dhcp \
It would be helpful to add debug options to this, but there's currently an issue with those versions of pasta and Podman: Podman passes --quiet
by default, and pasta doesn't like --debug
(or --trace
) together with --quiet
. :( I think we should eventually drop that check in pasta.
If you have the chance to rebuild pasta (dropping that check) or Podman (not passing --quiet
), could you please add something like --trace,--log-file,/tmp/pasta_caddy.log,--log-size,1000000000
to that, and share the contents of the log file? Dropping IP addresses is fine if they're confidential.
Yeah, I think "last option wins" is the usual convention for conflicting logging options. We should definitely switch to that for exactly this sort of reason.
The specific assertion indicates we have an internal bug where we're returning to the main loop having only partially constructed a new flow. Unfortunately, without the debug logging it's a bit hard to tell the sequence of events that lead up to it.
I don't know for certain if it's the cause of this problem, but I did spot by inspection some error paths where we might not be cleaning up properly after ourselves, in a way that could cause this assertion to trip. Patch coming soon.
Ok, I've now made a draft patch which should fix the problems I saw - again these plausibly could be the cause of this bug, but I can't be certain. I've posted it to the passt dev list, and I've also pushed it to a gitlab branch. @Jojonintendo if you're able to test that branch, that would be great.
I'm not very literate with git, but I think it's working. Here's how I've tried the new branch:
I now have the pasta binary in /usr/local/bin, which is the one in use by the system. After a reboot everything seems fine, I couldn't trigger the error. I'll let it as is for a few days, but to me it sounds like it's fixed.
Previously this error would appear on boot every time, while all the containers are also starting. Now it just boots fine and I can't trigger the failure, which was very easy before.
How long should I let this run for before considering it healthy?
Anyway, many thanks for the blazing fast support, amazing job everyone!
How long should I let this run for before considering it healthy?
Looking at David's patch, I came up with an embarrassing simple reproducer, which I think is similar to what happened in your case: the container would try to access an unreachable port using a local, but not loopback address, and we end up in the first path that the patch fixes.
The reproducer is: ./pasta --config-net -t none -u none -T none -U none -- sh -c 'nc $(ip -j -4 route show | jq -rM ".[] | select(.dst == \"default\").gateway") 12345
, assuming that port 12345 is not bound on the host.
It's pretty bad, so I just made a new release with the fix. I guess it will reach Arch Linux soon.
It's already in the repos, I've installed it and rebooted the machine. Everything is fine, thanks for the information and quick fix!
Issue Description
I have a homelab with more than 70 podman containers, all behind a caddy reverse proxy that's also containerized. All of them are run with the following network config:
--net=pasta:--ipv4-only,-a,10.0.2.100,-n,24,-g,10.0.2.2,--dns-forward,10.0.2.3,-m,1500,--no-ndp,--no-dhcpv6,--no-dhcp \
With pasta version 2024_05_23.765eb0b-1 I get the following error in caddy's logs:
ASSERTION FAILED in flow_defer_handler (flow.c:315): !flow_new_entry
This doesn't happen always, and seems to be triggered by the steps detailed below:
All the reverse proxying stops working, however containers can still reach each other via the IP of the host 192.168.x.x (I have seen Home Assistant still be able to talk to Zigbee2MQTT, ESPHome, etc).
When this error appears, it's always the last line in the logs. Caddy stays up but no more logs are shown and no other service is reachable. Upon restarting caddy and the other container that was down (AdGuard in this example), everything works again.
This is troublesome as I have many containers self-updating daily with podman-auto-update.timer. If a container stays down too long the error is more likely to appear, and it never recovers by itself.
Reverting to pasta version 2024_05_10.7288448-1 fixes the issue completely.
Steps to reproduce the issue
Steps to reproduce the issue
Describe the results you received
Caddy reverse proxy shows the pasta error in its logs and doesn't go any further, like it's frozen. The container stays up and looks fine though, its healt check doesn't trigger a restart. No service is proxied, so all services appear as down from the outside.
Describe the results you expected
With pasta version 2024_05_10.7288448-1 this error doesn't show up, and caddy keeps working no matter how many services might be temporarily down.
podman info output
Podman in a container
No
Privileged Or Rootless
Rootless
Upstream Latest Release
Yes
Additional environment details
Pasta version 2024_05_23.765eb0b-1 is the latest in the repos so I could try any newer version, and have it locked at 2024_05_10.7288448-1.
The rest of the system uses the latest versions, including podman 5.1.1.
Additional information
No response