cloudfoundry-community / splunk-firehose-nozzle

Send CF component metrics, CF app logs, and CF app metrics to Splunk
Apache License 2.0
28 stars 29 forks source link

"[BUG]" Splunk Firehose Nozzle App crashes on Tanzu Application Services 5.0.13 #369

Closed Christherookie closed 1 month ago

Christherookie commented 2 months ago

What happened we haved faced the problem that the firehose nozzle app in our lab environment is not starting anymore. We are using VMware Tanzu Application Service in Version 5.0.13 and the firehose nozzle app in version 1.3.1. In the log of the app we just get a line with an unkown error:

2024-05-22T10:28:21.184+02:00 [APP/PROC/WEB/1] [OUT] {"timestamp":"1716366501.183817625","source":"splunk-nozzle-logger","message":"splunk-nozzle-logger.Failed to open App Cache","log_level":2,"data":{}}
2024-05-22T10:28:21.184+02:00 [APP/PROC/WEB/1] [OUT] {"timestamp":"1716366501.183915854","source":"splunk-nozzle-logger","message":"splunk-nozzle-logger.Failed to run splunk-firehose-nozzle","log_level":2,"data":{"error":"Error requesting apps: cfclient error (UnknownError|10001): An unknown error occurred."}}

We also have some other prod - foundations with the combination of VMware Tanzu Application 4.0.23 and firehose nozzle app in version 1.3.1. In this environments the firehose nozzle app works fine.

We have already checked the credentials for the cloudfoundry api and the splunk token. Both looks quite good.

What you expected to happen: Firhose nozzle app should run and deliver logs to splunk

How to reproduce it (as minimally and precisely as possible): Use VMware Tanzu Application Service in Version 5.0.13 and the firehose nozzle app in version 1.3.1. and try to start the app

Anything else we need to know?: I have attached the complete Logs from the startup in Cloudfoundry SplunkFirehoseNozzleApp_logs.txt We also checked our credentials we use for authenticating at the cloudfoundry api and the splunk - token. They are fine.

We also saw this issue https://github.com/cloudfoundry-community/splunk-firehose-nozzle/issues/207 and tried to lower the duration for APP_CACHE_INVALIDATE_TTL and MISSING_APP_CACHE_INVALIDATE_TTL to 600s with no effect.

Environment:

Best regards

ajasnosz commented 1 month ago

Hello, could you share your configuration that is not working? If you prefer you can share it via email to ajasnosz@splunk.com

Kind regards, Agnieszka

Christherookie commented 1 month ago

Hi Agnieszka, we added the configuration at the official splunk support portal in the case 3490407. Tell me if you can't access it there.

Kind regards, Chris

ajasnosz commented 1 month ago

Hello Chris, I will try to reach to get the configuration from support. In the meantime could you check what cf version you have and if you can login from terminal?

Christherookie commented 1 month ago

cf version is 8.7.1. The login is possible without any problems in the foundation.

ajasnosz commented 1 month ago

I got the config from support. I'll try to reproduce it. Do you configure it with ui or cli?

Christherookie commented 1 month ago

Hi Agnieszka, we found the problem in our environment. It was corrupt app in the cloudfoundry that makes the firehose nozzle crash. In the startup of the firehose app cf-api is called to get metadata for all apps und there was one app with corrupt metadata.

Thanks for your support and sorry for the effort you had.

Best regards Chris