Closed theFroh closed 4 years ago
Hi @theFroh,
looking at the error I see the following
...[2018/09/11 02:24:10] [error] [oauth2] could not get an upstream connection
that means that the plugin could not establish a network connection with Google services, please validate in your end that your system can reach the following HTTPs end-points:
Hey @edsiper,
The machine definitely has outbound access, and in particular, those two end-points are definitely accessible from the machine:
$ nmap -p 443 logging.googleapis.com www.googleapis.com
Starting Nmap 7.01 ( https://nmap.org ) at 2018-09-12 05:37 UTC
Nmap scan report for logging.googleapis.com (172.217.25.170)
Host is up (0.0018s latency).
Other addresses for logging.googleapis.com (not scanned): 2404:6800:4006:803::200a 172.217.167.74 172.217.167.106 216.58.196.138 216.58.199.74 216.58.200.106 216.58.203.106 216.58.220.106 172.217.25.138
rDNS record for 172.217.25.170: sin01s16-in-f10.1e100.net
PORT STATE SERVICE
443/tcp open https
Nmap scan report for www.googleapis.com (216.58.203.106)
Host is up (0.0017s latency).
Other addresses for www.googleapis.com (not scanned): 2404:6800:4006:803::200a 216.58.220.138 172.217.25.138 172.217.167.74 172.217.167.106 216.58.196.138 216.58.199.42 216.58.199.74 216.58.200.106
rDNS record for 216.58.203.106: syd09s15-in-f10.1e100.net
PORT STATE SERVICE
443/tcp open https
Cheers for assisting!
would you please trace debug messages with 'Log_Level trace' (in [SERVICE] section) and share the output ?
No worries, that only really adds a JWT signature printout, though.
Sep 17 01:16:52 hostname td-agent-bit[1810]: [2018/09/17 01:16:52] [ info] [engine] started (pid=1810)
Sep 17 01:16:52 hostname td-agent-bit[1810]: [2018/09/17 01:16:52] [debug] [out_stackdriver] JWT signature:
Sep 17 01:16:52 hostname td-agent-bit[1810]: xxx.xxx.xxx
Sep 17 01:16:52 hostname td-agent-bit[1810]: [2018/09/17 01:16:52] [error] [oauth2] could not get an upstream connection
Sep 17 01:16:52 hostname td-agent-bit[1810]: [2018/09/17 01:16:52] [error] [out_stackdriver] error retrieving oauth2 access token
Sep 17 01:16:52 hostname td-agent-bit[1810]: [2018/09/17 01:16:52] [ warn] [out_stackdriver] token retrieval failed
Sep 17 01:16:52 hostname td-agent-bit[1810]: [2018/09/17 01:16:52] [debug] [router] match rule cpu.0:stdout.0
Sep 17 01:16:52 hostname td-agent-bit[1810]: [2018/09/17 01:16:52] [debug] [router] match rule cpu.0:stackdriver.0
The JWT signature has a payload containing (with our correct account name removed):
{
"iss": "<STATS SERVICE ACCOUNT>@<PROJECT NAME>.iam.gserviceaccount.com",
"scope": "https://www.googleapis.com/auth/logging.write",
"aud": "https://www.googleapis.com/oauth2/v4/token",
"exp": 1537150012,
"iat": 1537147012
}
And header:
{
"alg": "RS256",
"typ": "JWT"
}
I can't check if the JWT itself is valid as I've not got the secret or public key to verify with.
I will try to replicate the problem in a 16.04 box, I tested again in my 18.04 and works fine.
no issues here, if you generate a new token file does it works ?
What is providing your 16.04 testing box? Mine is just a standard, run of the mill VPS; not provided by AWS or the like.
To generate a new token, I've followed the following steps from Google as they seem the most applicable:
JSON
private key export option to generate a key file./etc/google/auth/
and then updated /etc/td-agent-bit/td-agent-bit.conf
so that google_service_credentials
is set correctly. systemctl restart td-agent-bit
and systemctl status td-agent-bit
This reports the same [error] [oauth2] could not get an upstream connection
Am I missing any steps here, or misinterpretting any of the documentation, whether on Fluent Bit's or Google's end?
EDIT: I have also just nabbed the JWT signature from the logs again; it is definitely referencing the correct account in there.
I deployed fluentbit 0.14 in K8S cluster.
The important config is the env variable QA >> kubectl exec fluent-bit-77zr7 -n kube-system env PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOSTNAME=fluent-bit-77zr7 GOOGLE_SERVICE_CREDENTIALS=/gcp/stackdriver-service-account.json
From the fluent-bit-ds.yaml file:
spec:
containers:
- name: fluent-bit
image: fluent/fluent-bit:0.14.5
imagePullPolicy: Always
ports:
- containerPort: 2020
env:
- name: GOOGLE_SERVICE_CREDENTIALS
value: /gcp/stackdriver-service-account.json
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: ssa-volume
mountPath: /gcp
- name: fluent-bit-config
mountPath: /fluent-bit/etc/
.
.
.
.
volumes:
- name: ssa-volume
secret:
secretName: stackdriver-service-account
The above config need to have a secrete created like so:
kubectl create secret generic --namespace=kube-system stackdriver-service-account --from-file=./stackdriver-service-account.json
I mostly following instruction from here: https://docs.fluentbit.io/manual/installation/kubernetes
swapped the elasticsearch OUTPUT with stackdriver. But I also tried the simple configmap suggested here: https://docs.fluentbit.io/manual/output/stackdriver
Got the StackDriver authentication working I believe:
QA >> kubectl logs -n kube-system fluent-bit-77zr7
Fluent-Bit v0.14.5
Copyright (C) Treasure Data
[2018/10/30 19:28:51] [ info] [engine] started (pid=1)
[2018/10/30 19:28:51] [ info] [oauth2] HTTP Status=200
[2018/10/30 19:28:51] [ info] [oauth2] access token from 'www.googleapis.com:443' retrieved
Problem is I don't see logs in my stackdriver project.
The final configmap I use is:
QA >> cat fluent-bit-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: kube-system
labels:
k8s-app: fluent-bit
data:
fluent-bit.conf: |
[INPUT]
Name cpu
Tag cpu
[OUTPUT]
Name stackdriver
Match *
I did not set the env variables such as SERVICE_ACCOUNT_EMAIL & SERVICE_ACCOUNT_SECRET because I already have GOOGLE_SERVICE_CREDENTIALS setup. I did not set resource to global thinking this is already the default.
Is there any other logs I can get to dig in more? Don't know what else to try at this point.
@stevenarvar look under Global
, the first filter. Not under the service account.
@theFroh this is definitely a issue with not being able to hit the google api servers from the box. Please check your connectivity from the box to those services. I was getting the same error and once I enabled the traffic to go through it works. Although in the beginning of the pod I do get a few errors but afterwards it works. The reason for initial connection failure in my env is I am running istio and those pods have to init before the traffic is routed correctly. I have tested with v0.14.9
and v1.0.1
.
I had to enable traffic to the following urls:
logging.googleapis.com
www.googleapis.com
logs:
Fluent-Bit v0.14.9
Copyright (C) Treasure Data
[2019/01/07 23:24:09] [ info] [engine] started (pid=1)
[2019/01/07 23:24:09] [error] [oauth2] could not get an upstream connection
[2019/01/07 23:24:09] [error] [out_stackdriver] error retrieving oauth2 access token
[2019/01/07 23:24:09] [ warn] [out_stackdriver] token retrieval failed
.
.
.
[2019/01/07 23:24:10] [error] [io] TCP connection failed: logging.googleapis.com:443 (Connection refused)
.
.
.
[2019/01/07 23:24:12] [ info] [oauth2] HTTP Status=200
[2019/01/07 23:24:12] [ info] [oauth2] access token from 'www.googleapis.com:443' retrieved
Fluent Bit v1.1.0
Copyright (C) Treasure Data
[2019/01/08 16:41:39] [ info] [storage] initializing...
[2019/01/08 16:41:39] [ info] [storage] in-memory
[2019/01/08 16:41:39] [ info] [storage] normal synchronization mode, checksum disabled
[2019/01/08 16:41:39] [ info] [engine] started (pid=1)
[2019/01/08 16:41:39] [error] [oauth2] could not get an upstream connection
[2019/01/08 16:41:39] [error] [out_stackdriver] error retrieving oauth2 access token
[2019/01/08 16:41:39] [ warn] [out_stackdriver] token retrieval failed
.
.
.
[2019/01/08 16:41:40] [error] [io] TCP connection failed: logging.googleapis.com:443 (Connection refused)
.
.
.
[2019/01/08 16:41:43] [ info] [oauth2] HTTP Status=200
[2019/01/08 16:41:43] [ info] [oauth2] access token from 'www.googleapis.com:443' retrieved
is there any extra information that we could add to the documentation ? or is it good to close the ticket ?
@edsiper the two domains should be added to the docs. And in the logging it should print the full url to which the access was deined or the request failed at, for examplemade a call to https://www.googleapis.com/oauth2/token to get the token and failed, connection refused (or in case of a HTTP error, received HTTP: 404, etc.)
. This way it is clear what is happening from the logs.
@varun-da Just in response to your own reply before, definitely understand that it is a likely cause, but the first thing we checked off in this issue was connectivity from the box to those two addresses. I can confirm I still have connectivity.
I'm still hitting the issue, though:
Jan 09 09:00:15 hostname td-agent-bit[15149]: [2019/01/09 09:00:15] [ info] [engine] started (pid=15149)
Jan 09 09:00:15 hostname td-agent-bit[15149]: [2019/01/09 09:00:15] [debug] [out_stackdriver] JWT signature:
Jan 09 09:00:15 hostname td-agent-bit[15149]: removed
Jan 09 09:00:15 hostname td-agent-bit[15149]: [2019/01/09 09:00:15] [error] [oauth2] could not get an upstream connection
Jan 09 09:00:15 hostname td-agent-bit[15149]: [2019/01/09 09:00:15] [error] [out_stackdriver] error retrieving oauth2 access token
Jan 09 09:00:15 hostname td-agent-bit[15149]: [2019/01/09 09:00:15] [ warn] [out_stackdriver] token retrieval failed
Jan 09 09:00:15 hostname td-agent-bit[15149]: [2019/01/09 09:00:15] [debug] [router] match rule cpu.0:stdout.0
Jan 09 09:00:15 hostname td-agent-bit[15149]: [2019/01/09 09:00:15] [debug] [router] match rule cpu.0:stackdriver.0
Jan 09 09:00:15 hostname td-agent-bit[15149]: [2019/01/09 09:00:15] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
Jan 09 09:00:16 hostname td-agent-bit[15149]: [2019/01/09 09:00:15] [debug] [input cpu.0] [mem buf] size = 317
Cheers for the assistance!
@theFroh the next step I would take is making a call using curl with verbosity andd using the JWT token to the googleapis.com server to get the oauth2 token from that box. perhaps @edsiper can point to the documentation for doing this.
I think I found it: https://developers.google.com/identity/protocols/OAuth2ServiceAccount
Example from the page, I added the -v
flag, and you would have to replace the JWT token with generated by the fluent-bit instance on that machine JWT token:
curl -v -d 'grant_type=urn%3Aietf%3Aparams%3Aoauth%3Agrant-type%3Ajwt-bearer&assertion=<JWT token from fluent-bit instance>' https://www.googleapis.com/oauth2/v4/token
curl -v -d 'grant_type=urn%3Aietf%3Aparams%3Aoauth%3Agrant-type%3Ajwt-bearer&assertion=eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiI3NjEzMjY3OTgwNjktcjVtbGpsbG4xcmQ0bHJiaGc3NWVmZ2lncDM2bTc4ajVAZGV2ZWxvcGVyLmdzZXJ2aWNlYWNjb3VudC5jb20iLCJzY29wZSI6Imh0dHBzOi8vd3d3Lmdvb2dsZWFwaXMuY29tL2F1dGgvcHJlZGljdGlvbiIsImF1ZCI6Imh0dHBzOi8vYWNjb3VudHMuZ29vZ2xlLmNvbS9vL29hdXRoMi90b2tlbiIsImV4cCI6MTMyODU3MzM4MSwiaWF0IjoxMzI4NTY5NzgxfQ.RZVpzWygMLuL-n3GwjW1_yhQhrqDacyvaXkuf8HcJl8EtXYjGjMaW5oiM5cgAaIorrqgYlp4DPF_GuncFqg9uDZrx7pMmCZ_yHfxhSCXru3gbXrZvAIicNQZMFxrEEn4REVuq7DjkTMyCMGCY1dpMa8aWfTQFt3Eh7smLchaZsU' https://www.googleapis.com/oauth2/v4/token
This would definitely help in debugging this further.
@varun-da Ah, that's definitely a great way to test here.
Running it myself with the token as reported in the logs yields a success in my books:
* Trying 2404:6800:4006:802::200a...
* Connected to www.googleapis.com (2404:6800:4006:802::200a) port 443 (#0)
* found 148 certificates in /etc/ssl/certs/ca-certificates.crt
* found 596 certificates in /etc/ssl/certs
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_ECDSA_AES_128_GCM_SHA256
* server certificate verification OK
* server certificate status verification SKIPPED
* common name: *.googleapis.com (matched)
* server certificate expiration date OK
* server certificate activation date OK
* certificate public key: EC
* certificate version: #3
* subject: C=US,ST=California,L=Mountain View,O=Google LLC,CN=*.googleapis.com
* start date: Wed, 19 Dec 2018 08:17:00 GMT
* expire date: Wed, 13 Mar 2019 08:17:00 GMT
* issuer: C=US,O=Google Trust Services,CN=Google Internet Authority G3
* compression: NULL
* ALPN, server accepted to use http/1.1
> POST /oauth2/v4/token HTTP/1.1
> Host: www.googleapis.com
> User-Agent: curl/7.47.0
> Accept: */*
> Content-Length: 747
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 747 out of 747 bytes
< HTTP/1.1 200 OK
< Content-Type: application/json; charset=utf-8
< Vary: X-Origin
< Vary: Referer
< Date: Fri, 11 Jan 2019 01:56:36 GMT
< Server: ESF
< Cache-Control: private
< X-XSS-Protection: 1; mode=block
< X-Frame-Options: SAMEORIGIN
< X-Content-Type-Options: nosniff
< Alt-Svc: quic=":443"; ma=2592000; v="44,43,39,35"
< Accept-Ranges: none
< Vary: Origin,Accept-Encoding
< Transfer-Encoding: chunked
<
{
"access_token": "<access token omitted>",
"expires_in": 3600,
"token_type": "Bearer"
* Connection #0 to host www.googleapis.com left intact
}
Which doesn't really clear anything up unfortunately. I wonder how Fluentbit's networking differs.
+1, I am hit by this too. I get a 200 when I do the curl with the JWT token copied from the logs, and the same oauth error from fluentbit logs.
I'm getting the exact same thing:
....
Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
* We are completely uploaded and fine
< HTTP/2 200
< content-type: application/json; charset=utf-8
< vary: X-Origin
< vary: Referer
< vary: Origin,Accept-Encoding
< date: Mon, 25 Mar 2019 19:21:31 GMT
< server: ESF
< cache-control: private
< x-xss-protection: 1; mode=block
< x-frame-options: SAMEORIGIN
< x-content-type-options: nosniff
< alt-svc: quic=":443"; ma=2592000; v="46,44,43,39"
< accept-ranges: none
<
{
"access_token": "Removed",
"expires_in": 3600,
"token_type": "Bearer"
* Connection #0 to host www.googleapis.com left intact
}
How did other folks resolve this?
Fluent Bit v1.0.4
Copyright (C) Treasure Data
[2019/03/26 15:55:49] [debug] [storage] [cio stream] new stream registered: syslog.0
[2019/03/26 15:55:49] [ info] [storage] initializing...
[2019/03/26 15:55:49] [ info] [storage] in-memory
[2019/03/26 15:55:49] [ info] [storage] normal synchronization mode, checksum disabled
[2019/03/26 15:55:49] [ info] [engine] started (pid=40718)
[2019/03/26 15:55:49] [debug] [engine] coroutine stack size: 65536 bytes (64.0K)
[2019/03/26 15:55:49] [ info] [in_syslog] UDP buffer size set to 32768 bytes
[2019/03/26 15:55:49] [debug] [out_stackdriver] JWT signature: <SNIP>
[2019/03/26 15:55:49] [error] [oauth2] could not get an upstream connection
[2019/03/26 15:55:49] [error] [out_stackdriver] error retrieving oauth2 access token
[2019/03/26 15:55:49] [ warn] [out_stackdriver] token retrieval failed
[2019/03/26 15:55:49] [debug] [router] match rule syslog.0:stdout.0
[2019/03/26 15:55:49] [debug] [router] match rule syslog.0:stackdriver.0
i've tracked the error back to this line: https://github.com/fluent/fluent-bit/blob/ba0e6c5b0f44b484dfe06b9b05771ecdd78a61dd/src/flb_oauth2.c#L324
i don't know what can cause flb_upstream_conn_get
to fail...
I was never able to.
that specific upstream connection error is a TCP connection error reaching the HTTPS end-point.
Thanks for the pointer @edsiper In my case this is on a freebsd jail, but curl works fine with https reaching the google apis. any pointers as to how to diagnose this SSL/TLS issue? I can try getting a tcp dump to see if that shows any issues...
@jakeswenson did you try tls.debug N ?:
https://docs.fluentbit.io/manual/configuration/tls_ssl
If you try to do the same thing in a Linux box does it works ? I am wondering if is there any issue on BSD that needs to be fixed.
@edsiper i just tried with that setting and i am seeing not new output. Does stackdriver
respect this tls setting?
Fluent Bit v1.0.4
Copyright (C) Treasure Data
[2019/03/28 13:11:51] [debug] [storage] [cio stream] new stream registered: dummy.0
[2019/03/28 13:11:51] [debug] [storage] [cio stream] new stream registered: syslog.0
[2019/03/28 13:11:51] [ info] [storage] initializing...
[2019/03/28 13:11:51] [ info] [storage] in-memory
[2019/03/28 13:11:51] [ info] [storage] normal synchronization mode, checksum disabled
[2019/03/28 13:11:51] [ info] [engine] started (pid=87027)
[2019/03/28 13:11:51] [debug] [engine] coroutine stack size: 65536 bytes (64.0K)
[2019/03/28 13:11:51] [ info] [in_syslog] UDP buffer size set to 32768 bytes
[2019/03/28 13:11:51] [debug] [out_stackdriver] JWT signature: <SNIP>
[2019/03/28 13:11:51] [error] [oauth2] could not get an upstream connection
[2019/03/28 13:11:51] [error] [out_stackdriver] error retrieving oauth2 access token
[2019/03/28 13:11:51] [ warn] [out_stackdriver] token retrieval failed
[2019/03/28 13:11:51] [debug] [router] match rule dummy.0:stdout.0
[2019/03/28 13:11:51] [debug] [router] match rule dummy.0:stackdriver.0
[0] dummy.log: [1553803912.848473852, {"message"=>"dummy"}]
[2019/03/28 13:11:56] [debug] [task] created task=0x801c40300 id=0 OK
[1] dummy.log: [1553803913.852387878, {"message"=>"dummy"}]
[2] dummy.log: [1553803914.863814322, {"message"=>"dummy"}]
[3] dummy.log: [1553803915.908904521, {"message"=>"dummy"}]
[2019/03/28 13:11:56] [debug] [retry] new retry created for task_id=0 attemps=1
[2019/03/28 13:11:56] [debug] [sched] retry=0x801c26f80 0 in 11 seconds
I ran with tls.debug 3
here is my config
[SERVICE]
Flush 5
Daemon off
Log_Level trace
Coro_Stack_Size 65536
Parsers_File /usr/local/etc/fluent-bit/parsers.conf
[INPUT]
Name dummy
Tag dummy.log
[INPUT]
Name syslog
Path /tmp/in_syslog
Chunk_Size 32
Buffer_Size 64
Tag syslog.log
[OUTPUT]
Name stdout
Match dummy.*
[OUTPUT]
Name stackdriver
Match dummy.*
google_service_credentials /etc/gcp.creds.json
resource global
tls On
tls.verify Off
tls.debug 3
also i ran a tcpdump
and the only traffic i am getting is DNS requests for www.googleapis.com
and logging.googleapis.com
(both resolve) and no actual TCP traffic...
i can try to find a linux box to try this on, but it may take some time... until then it seems like the error is in the http library after dns but before actually sending a packet.... any thoughts @edsiper?
we use a pretty common libc function to resolve DNS:
https://github.com/fluent/fluent-bit/blob/master/src/flb_network.c#L215
hmm not sure what can be since at least you should see a warning or error message.
i've been able to patch a build my own version of fluent bit to print a bit more logging to try and find where the error is. https://github.com/fluent/fluent-bit/blob/master/src/flb_network.c#L311 this line is failing with errno 22 (EINVAL) i have no idea why or what this means... any thoughts @edsiper?
EINVAL = invalid argument, which function returned that ? connect () ?
On Thu, Mar 28, 2019 at 4:16 PM Jake Swenson notifications@github.com wrote:
i've been able to patch a build my own version of fluent bit to print a bit more logging to try and find where the error is. https://github.com/fluent/fluent-bit/blob/master/src/flb_network.c#L311 this line is failing with errno 22 (EINVAL) i have no idea why or what this means... any thoughts @edsiper https://github.com/edsiper?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fluent/fluent-bit/issues/761#issuecomment-477791510, or mute the thread https://github.com/notifications/unsubscribe-auth/AAWkNhktng7HtyeJI8WyHAFtjD_V6OFjks5vbT9VgaJpZM4WigHJ .
-- Eduardo Silva Blog: http://edsiper.linuxchile.cl Twitter: @edsiper http://twitter.com/edsiper OSS: http://monkey-project.com | http://duda.io | http://fluentbit.io
yes, connect()
This appears to be related to ipv6. If I turn off ipv6 support as follows, things work as expected.
sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
sudo sysctl -w net.ipv6.conf.default.disable_ipv6=1
sudo sysctl -w net.ipv6.conf.lo.disable_ipv6=1
Wait what? @sebbacon thanks for testing disabling ipv6 fixes. I think that it's a poor experience if instead of the plugin filtering the ipv6 address if it doesn't support it that I'd have to go modify my machine to disable ipv6 to run fluent-bit? can anyone point me at the code that is at issue and i can try to look in to fixing this?
Also i can verify that i have ipv6 enabled (on loopback...) and that google (obviously) has an AAAA record:
# host www.googleapis.com
www.googleapis.com is an alias for googleapis.l.google.com.
googleapis.l.google.com has address 172.217.3.202
googleapis.l.google.com has address 172.217.14.202
googleapis.l.google.com has address 172.217.14.234
googleapis.l.google.com has IPv6 address 2607:f8b0:400a:803::200a
# ifconfig
lo0: flags=8048<LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
inet6 ::1 prefixlen 128 tentative
inet6 fe80::1%lo0 prefixlen 64 tentative scopeid 0x1
inet 127.0.0.1 netmask 0xff000000
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
groups: lo
epair1b: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=8<VLAN_MTU>
inet 10.0.51.50 netmask 0xffff0000 broadcast 10.0.255.255
nd6 options=1<PERFORMNUD>
media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
status: active
groups: epair
In the environment in which priority set IPv6 higher than IPv4, I fined that it failed to establish upstream connection to oauth2 and stackdriver logging, so I reported #1348.
The fixes has been merged into v1.2. If you can use v1.2 fluentbit, run it with the following setting.
[OUTPUT]
Name stackdriver
Match *
IPv6 On
You can specify IPv6 On
in the configuration of out_stackdriver as other out plugin, out_stackdriver module use IPv6 mode explicitly.
However, oauth2 is a little different. In the fixes, the oauth2 module attempt to try to connect by IPv6 mode, if upstream connection by IPv4 was failed. I wonder if it might be better to make oauth2 module as configurable like out plugin...
In addition, out_bigquery plugin probably has the same problem. Since I was not able to test using bigquery and it was enough for me to fix out_stackdriver, so I did not fix out_bigquery.
thanks everyone for the report, I've added ipv6 mode to out_bigquery on 466191c3
i'm built and ran fluent-bit 1.2.1
on my freebsd machine and i'm still getting the same error:
# ./fluent-bit -c /etc/logs.conf
Fluent Bit v1.2.1
Copyright (C) Treasure Data
[2019/07/19 08:43:32] [debug] [storage] [cio stream] new stream registered: dummy.0
[2019/07/19 08:43:32] [debug] [storage] [cio stream] new stream registered: syslog.1
[2019/07/19 08:43:32] [ info] [storage] initializing...
[2019/07/19 08:43:32] [ info] [storage] in-memory
[2019/07/19 08:43:32] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2019/07/19 08:43:32] [ info] [engine] started (pid=43877)
[2019/07/19 08:43:32] [debug] [engine] coroutine stack size: 65536 bytes (64.0K)
[2019/07/19 08:43:32] [ info] [in_syslog] UDP buffer size set to 32768 bytes
[2019/07/19 08:43:32] [debug] [out_stackdriver] JWT signature:
eyJhbG<SNIP>
[2019/07/19 08:43:32] [error] [oauth2] could not get an upstream connection
[2019/07/19 08:43:32] [error] [out_stackdriver] error retrieving oauth2 access token
[2019/07/19 08:43:32] [ warn] [out_stackdriver] token retrieval failed
[2019/07/19 08:43:32] [debug] [router] match rule dummy.0:stdout.0
[2019/07/19 08:43:32] [debug] [router] match rule dummy.0:stackdriver.1
[2019/07/19 08:43:32] [ info] [sp] stream processor started
^C[engine] caught signal (SIGINT)
[2019/07/19 08:43:35] [ info] [input] pausing dummy.0
[2019/07/19 08:43:35] [ info] [input] pausing syslog.1
config:
# cat /etc/logs.conf
[SERVICE]
Flush 5
Daemon off
Log_Level trace
Coro_Stack_Size 65536
Parsers_File /usr/local/etc/fluent-bit/parsers.conf
[INPUT]
Name dummy
Tag dummy.log
[INPUT]
Name syslog
Path /tmp/in_syslog
Chunk_Size 32
Buffer_Size 64
Tag syslog.log
[OUTPUT]
Name stdout
Match dummy.*
[OUTPUT]
Name stackdriver
Match dummy.*
google_service_credentials /etc/gcp.creds.json
resource global
tls On
tls.verify Off
tls.debug 4
IPv6 On
i doesn't matter if i configure IPv6
to On
or Off
same error.
is there anything else i can do to help debug this?
looks like the output above don't have trace messages, would you please re-run it ? (I see the trace enabled in the config, but I don't see it in the output)
@edsiper as i'm sure you know trace
requires fluent-bit to be built with tracing enabled... https://docs.fluentbit.io/manual/configuration/file#config_section
I'm certain it's not building that by default, and i need to read up on how its enabled using the options framework
Are there any log lines in particular you're looking for from tracing?
FYI: Stackdriver output plugin has been improved heavily the latest team (thanks to Google team involvement in the project), I am closing this ticket. Pls create a new one if you still faces an issue.
I am still seeing this in 1.7. The stackdriver plugin logs nothing even at trace.
@theFroh @edsiper Can we reopen this issue? I am seeing the same issues with ipv6 reported in this thread. I installed the 1.7.4 amd64
version via the Debian package.
for new issues please open a new ticket.
FYI: v1.7.6 was tested extensible with Stackdriver on Google Cloud: 10 hours run sending 150k messages per second, no issues found.
Bug Report
Describe the bug I have followed the configuration guide for Stackdriver in the manual, but have had no success in establishing a connection to Stackdriver.
To Reproduce
fluent-bit
on an Ubuntu 16.04 LTS box/etc/google/auth/
/etc/td-agent-bit/td-agent-bit.conf
to include:systemctl restart td-agent-bit.service
systemctl status td-agent-bit.service
:Expected behavior I expected authentication to succeed against Stackdriver.
Your Environment
[OUTPUT]
section as described above. I had to comment out thePlugins_File plugins.conf
line as this file does not exist by default and I couldn't find any documentation on the intended contents of such a file. (I also attempted putting the[OUTPUT]
config forstackdriver
into this file, as well as just leaving the file blank)stackdriver
output plugin.Additional context I'm trying to use
fluent-bit
to consume and send through server stats from a VPS we have, that is not part of our Google Cloud cluster.