Closed amalic closed 4 days ago
@amalic
v1.1.0
instead, does anything change with that helm chart ?"protocol": "HTTP/2"
, but you are not using GRPCRoute
nor are you setting any applicationProtocol
field on the Service, so its weird why envoy is trying to connect to the upstream over http2is the error consistently seen or only sometimes after a duration ?
yes
can you try
v1.1.0
instead, does anything change with that helm chart ?
not at the moment
the access logs shows
"protocol": "HTTP/2"
, but you are not usingGRPCRoute
nor are you setting anyapplicationProtocol
field on the Service, so its weird why envoy is trying to connect to the upstream over http2
It's very strange.
The docker File is based on a nginx:alpine
image. I even tried to increase timeouts and setting http1 protocol through a ClientTrafficPolicy
and 5 retries on any 5xx error through a BackendTrafficPolicy
. Still the same result. And like I already said, when I port-forward the service or pod I get the expected response.
nginx default.config
server {
listen 80;
server_name _;
#...
}
@amalic the issue is that
kind: HTTPRoute
metadata:
name: webapp
is in the default
ns and your backend is in dev-stage
and there isnt any ReferenceGrant to allow linking route and backend, can you either add a ReferenceGrant or move the route into the backend ns ?
the status
field on the resource should be surfacing this
@arkodg Thanks for pointing that out. I actually copied the manifest from the yaml file which is applied using the specific namespace with kubectl. I double checked if it's in the correct namespace on the cluster, and fixed it in the samples I provided.
@arkodg Thanks to your HTTP2 comment I expanded my research, I came across this on the ISTIO Traffic Management Problems site
Envoy requires HTTP/1.1 or HTTP/2 traffic for upstream services. For example, when using NGINX for serving traffic behind Envoy, you will need to set the proxy_http_version directive in your NGINX configuration to be “1.1”, since the NGINX default is 1.0.
What do you think?
@arkodg When I run nginx -T
in a shell within the container I get followint output. Means the server is responding via HTTP 1.1. I can confirm this when doing a curl against the port-forwarded service and pods. I wll try to update to latest Envoy version to see if it will fix the error.
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
# configuration file /etc/nginx/nginx.conf:
user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log notice;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
#tcp_nopush on;
keepalive_timeout 65;
#gzip on;
include /etc/nginx/conf.d/*.conf;
}
# configuration file /etc/nginx/mime.types:
types {
text/html html htm shtml;
text/css css;
text/xml xml;
image/gif gif;
image/jpeg jpeg jpg;
application/javascript js;
application/atom+xml atom;
application/rss+xml rss;
text/mathml mml;
text/plain txt;
text/vnd.sun.j2me.app-descriptor jad;
text/vnd.wap.wml wml;
text/x-component htc;
image/avif avif;
image/png png;
image/svg+xml svg svgz;
image/tiff tif tiff;
image/vnd.wap.wbmp wbmp;
image/webp webp;
image/x-icon ico;
image/x-jng jng;
image/x-ms-bmp bmp;
font/woff woff;
font/woff2 woff2;
application/java-archive jar war ear;
application/json json;
application/mac-binhex40 hqx;
application/msword doc;
application/pdf pdf;
application/postscript ps eps ai;
application/rtf rtf;
application/vnd.apple.mpegurl m3u8;
application/vnd.google-earth.kml+xml kml;
application/vnd.google-earth.kmz kmz;
application/vnd.ms-excel xls;
application/vnd.ms-fontobject eot;
application/vnd.ms-powerpoint ppt;
application/vnd.oasis.opendocument.graphics odg;
application/vnd.oasis.opendocument.presentation odp;
application/vnd.oasis.opendocument.spreadsheet ods;
application/vnd.oasis.opendocument.text odt;
application/vnd.openxmlformats-officedocument.presentationml.presentation
pptx;
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
xlsx;
application/vnd.openxmlformats-officedocument.wordprocessingml.document
docx;
application/vnd.wap.wmlc wmlc;
application/wasm wasm;
application/x-7z-compressed 7z;
application/x-cocoa cco;
application/x-java-archive-diff jardiff;
application/x-java-jnlp-file jnlp;
application/x-makeself run;
application/x-perl pl pm;
application/x-pilot prc pdb;
application/x-rar-compressed rar;
application/x-redhat-package-manager rpm;
application/x-sea sea;
application/x-shockwave-flash swf;
application/x-stuffit sit;
application/x-tcl tcl tk;
application/x-x509-ca-cert der pem crt;
application/x-xpinstall xpi;
application/xhtml+xml xhtml;
application/xspf+xml xspf;
application/zip zip;
application/octet-stream bin exe dll;
application/octet-stream deb;
application/octet-stream dmg;
application/octet-stream iso img;
application/octet-stream msi msp msm;
audio/midi mid midi kar;
audio/mpeg mp3;
audio/ogg ogg;
audio/x-m4a m4a;
audio/x-realaudio ra;
video/3gpp 3gpp 3gp;
video/mp2t ts;
video/mp4 mp4;
video/mpeg mpeg mpg;
video/quicktime mov;
video/webm webm;
video/x-flv flv;
video/x-m4v m4v;
video/x-mng mng;
video/x-ms-asf asx asf;
video/x-ms-wmv wmv;
video/x-msvideo avi;
}
# configuration file /etc/nginx/conf.d/default.conf:
server {
listen 80;
server_name _;
location / {
port_in_redirect off;
alias /etc/nginx/html/;
proxy_http_version 1.1;
try_files $uri $uri/ //index.html;
# don't cache anything by default
add_header Cache-Control "no-store, no-cache, must-revalidate";
}
location //static {
port_in_redirect off;
alias /etc/nginx/html/static;
proxy_http_version 1.1;
expires 1y;
# cache create react app generated files because they all have a hash in the name and are therefore automatically invalidated after a change
add_header Cache-Control "public";
}
}
Strangest thing. I did another nginx test deployment, and I accidentally got a response when trying another reload. I found out that reloading multiple times eventually leads to a successful response. Thanks to the nginxdemos/hello
image I could see that the sucessfull response was always coming from the same container. After trying to scale the deployment up and down I found out that the container delivering a successful response was always running on the same node.
After adding NodeAffinity to the , I was able to deployment template spec I was able to get a response from all replicas.
Update: The nginx container is not available any more. When I deploy it on all nodes it now sometimes works on some other random node.
Here's the dployment I used:
---
apiVersion: v1
kind: Namespace
metadata:
name: nginx
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
namespace: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: mylabel
operator: In
values:
- myvalue
containers:
- name: nginx
image: nginxdemos/hello:latest
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: nginx-service
namespace: nginx
spec:
selector:
app: nginx
ports:
- name: http
port: 80
targetPort: 80
type: ClusterIP
---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
name: nginx-test
namespace: nginx
spec:
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: envoy-gw
namespace: gwapi-system
hostnames:
- "ngx-test.mydomain.mytld"
rules:
- backendRefs:
- name: nginx-service
kind: Service
namespace: nginx
port: 80
weight: 1
timeouts:
backendRequest: 0s
request: 0s
matches:
- path:
type: PathPrefix
value: /
closing this one since it looks like it was related to the backend and was resolved
Update: My previous solution was not correct and did not fix the problem.
Turns out since I am running Karpenter Autoscaler, I had to make sure that Envoy Proxy pods are running on Karpenter nodes by adding a node affinity to the pod spec of the custom-proxy-conf
This is what ended up working for me.
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
name: custom-proxy-config
namespace: gwapi-system
spec:
logging:
level:
default: warn
provider:
kubernetes:
envoyDeployment:
pod:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: autoscaler
operator: In
values:
- karpenter
replicas: 3
envoyService:
annotations:
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: instance
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
service.beta.kubernetes.io/aws-load-balancer-type: external
externalTrafficPolicy: Cluster
type: LoadBalancer
type: Kubernetes
I think this is a workaround for my issue. Once I find the root cause, I will update this issue.
Using Envoy Gateway 1.0.1. on our development cluster. All HttpRoutes work, except one route for a React Frontend. We are getting a 504 error on the client and a text saying
upstream connect error or disconnect/reset before headers. reset reason: connection timeout
which does not look like an error our app would output.What we have already tested:
Here's our setup
curl
Error Message from envoy gateway logs
Webapp Manifests
Gateway Manifests