Closed satadruroy closed 4 years ago
@satadruroy Do you have the full test output somewhere ? Please attach it to the ticket.
The excerpt in the description essentially misses all the details needed for proper debugging. (IIRC when a test fails the brain runner dumps the entire log for that test into the final output, to enable post-mortem analysis).
Brain logs from the referenced build. Not the full logs from concourse, just the part dealing with the brains tests. It has the (expected) 4, 5, and 10 failing (tcp-routing, metron, insecure-registry).
Wrt 004/tcp_routing the main error reported is
+ curl --fail -s -o /dev/null tcp-route-node-env-7516.ci-aks-9fec0ee3133b908c.susecap.net
+ curl tcp.ci-aks-9fec0ee3133b908c.susecap.net:20005
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to tcp.ci-aks-9fec0ee3133b908c.susecap.net port 20005: Connection refused
Command exited with 7
This points towards bad/missing setup of proper tcp routing for eirini ?
005/metron - The main part of the test pushes an application, and then inspects the cf logs --recent
of that app for a keyword.
+ cf logs node-env-9c84 --recent | grep -i Downloading
The push looks to be working, the keyword however is not found. It might be that under eirini the logs are different enough to not have the expected keyword, and something different instead.
That should be relatively easy to check outside of brain tests.
nodeenv-recent-eirini.log
Indeed, no downloading
to be found in this log.
There is a download
, unclear if that is similar to the sought line.
The app looks properly staged, marked as running, nothing bogus in the log.
Next, trial a diego cluster, for comparison of the logs. (Strongly suspect a diego/eirini difference here).
Tn diego the primary origin for Downloading
in the diego logs are all the buildpacks, and later Downloading app package
. That is all not really visible in Eirini.
An early entry in both logs looks to be Creating build for app with guid ...
.
Would recommend using that.
Now looking at the 010/insecure-registry, this looks to be same issue as for 004/tcp-routing, i.e
curl: (7) Failed to connect to tcp.ci-aks-9fec0ee3133b908c.susecap.net port 20005: Connection refused
Command exited with 7
@satadruroy @viovanov : Do we have proper support for TCP routing in eirini ?
Skipped tests:
018_autoscaler_test.rb
011_nfspersi_test.rb
017_syslog_forwarding_test.rb
Failed tests:
005_metron_test.rb with exit code 1
004_tcprouting_test.rb with exit code 1
010_insecure_registry_test.rb with exit code 1
007_buildpacks_test.rb with exit code 1
Maybe https://github.com/cloudfoundry-incubator/kubecf/pull/1398 will help with credhub tests (I've deployed from https://github.com/cloudfoundry-incubator/kubecf/tree/edg/persi-brains )
PR for this ticket started, see SUSE/brain-tests-release/pull/20. Test 005 for now, only.
Information wrt tcp routing, from the :rocket: ...
@f0rmiga writes: @gaktive @viovanov There's no way to do TCP routing with Eirini right now. The responsible for emitting the TCP route to the routing-api is the
route_emitter
that comes with Diego. It's implemented in https://github.com/cloudfoundry/route-emitter/blob/2d1c1653c62944048c3cec1243f97d9bf6232c56/emitter/routing_api_emitter.go. Eirini does have a route emitter, but it doesn't implement therouting-api
emitter: https://github.com/cloudfoundry-incubator/eirini/tree/0e9faaaa31778c6cb84828193a5af9b6b5e511d0/route. For a bit more context, the HTTP routes are emitted togorouter
directly, while the TCP routes are emitted torouting-api
. This is why we can actually disablerouting-api
when TCP routing is not needed. Thinking even more, with Eirini not supporting TCP routing, we can disablerouting-api
andtcp-router
completely. A longer-term solution would be to actually implement therouting-api
emitter in Eirini. @jimmykarily Any ideas?@troytop writes: @viovanov @f0rmiga @gaktive missing tcp routing in eirini is not a blocker for CAP 2.1
PR SUSE/brain-tests-release/pull/20 Merged. Watching for v0.0.15 build now.
@andreas-kupries I had to do the same investigation for the CATs that were failing on tcp routing. The result was this story on PT: https://www.pivotaltracker.com/story/show/174033038 . As you see it's not in the backlog yet.
Making a correction on my comment, we should not remove routing-api just because tcp-router doesn't work with it. It keeps track of the routes - not just the TCP ones. If the gorouter misses an update from nats, it can sync with the routing-api to keep the correct state.
Right now my local changes disable only 004 and 010. I.e tcp-routing test, and insecure-registry test.
New PR: #1468 (Just the BTR bump).
New PR: #1469 (Changed handling (defaults) of routing_api.enabled
).
KubeCF 2.5/Eirini/Ingress Controller EKS/k8s 1.17
Autoscaler failures are intermittent but
insecure_registry
,tcp_routing
andmetron
test failures were also observed on AKS with Eirini.