Open filidav opened 6 years ago
It seems is there https://github.com/cloudfoundry-community/firehose-to-syslog/blob/develop/caching/caching_lazyfill.go#L145
and according to the stacktrace nv
is 0x0 (nil)
Not sure why this happen, @aeijdenberg any idea ?
You are referencing a development branch. Was that a mistake? We pulled from master branch.
David Filippelli | Platform Engineer, Platform Reliability Enterprise Technology & Services 250 Bloor Street East | Toronto, Ontario, M4W 1E5 | M. 416-527-3343 David_Filippelli@manulife.commailto:David_Filippelli@manulife.com | manulife.comhttp://www.manulife.com/
Have a question? Want to know how we deliver? Check out our Playbookhttps://mfc.sharepoint.com/sites/GEO/playbook and be a part of the community!
[cid:image001.png@01D37BEB.A96D7410]
From: Etourneau Gwenn notifications@github.com Reply-To: cloudfoundry-community/firehose-to-syslog reply@reply.github.com Date: Wednesday, September 26, 2018 at 9:06 PM To: cloudfoundry-community/firehose-to-syslog firehose-to-syslog@noreply.github.com Cc: David Filippelli David_Filippelli@manulife.com, Author author@noreply.github.com Subject: Re: [cloudfoundry-community/firehose-to-syslog] ERR panic: runtime error: invalid memory address or nil pointer dereference^ (#204)
It seems is there https://github.com/cloudfoundry-community/firehose-to-syslog/blob/develop/caching/caching_lazyfill.go#L145 and according to the stacktrace nv is 0x0 (nil)
Not sure why this happen, @aeijdenberghttps://github.com/aeijdenberg any idea ?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/cloudfoundry-community/firehose-to-syslog/issues/204#issuecomment-424920809, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AWJsOlDZ1suvzD16HUcANcQKdgCJZTk-ks5ufCR5gaJpZM4W7Lpu.
STATEMENT OF CONFIDENTIALITY The information contained in this email message and any attachments may be confidential and legally privileged and is intended for the use of the addressee(s) only. If you are not an intended recipient, please: (1) notify me immediately by replying to this message; (2) do not use, disseminate, distribute or reproduce any part of the message or any attachment; and (3) destroy all copies of this message and any attachments.
same no ? https://github.com/cloudfoundry-community/firehose-to-syslog/blob/master/caching/caching_lazyfill.go#L145
This is what we added in red below.
import (
"encoding/json"
"fmt"
"math/rand"
"strings"
"time"
"errors"
func (c CacheLazyFill) normaliseAndSaveEntityToDatabase(entityType, uuid string, nv entity) error {
// Strip name suffixes if applicable. This is intended for blue green deployments,
// so that things like -venerable can be stripped from renamed apps
if entityType == "apps" {
for _, suffix := range c.config.StripAppSuffixes {
if nv == nil {
return errors.New("invalid nv entity")
}
if strings.HasSuffix(nv.Name, suffix) {
nv.Name = nv.Name[:len(nv.Name)-len(suffix)]
break
}
}
}
David Filippelli | Platform Engineer, Platform Reliability Enterprise Technology & Services 250 Bloor Street East | Toronto, Ontario, M4W 1E5 | M. 416-527-3343 David_Filippelli@manulife.commailto:David_Filippelli@manulife.com | manulife.comhttp://www.manulife.com/
Have a question? Want to know how we deliver? Check out our Playbookhttps://mfc.sharepoint.com/sites/GEO/playbook and be a part of the community!
[cid:image001.png@01D37BEB.A96D7410]
From: Etourneau Gwenn notifications@github.com Reply-To: cloudfoundry-community/firehose-to-syslog reply@reply.github.com Date: Wednesday, September 26, 2018 at 9:30 PM To: cloudfoundry-community/firehose-to-syslog firehose-to-syslog@noreply.github.com Cc: David Filippelli David_Filippelli@manulife.com, Author author@noreply.github.com Subject: Re: [cloudfoundry-community/firehose-to-syslog] ERR panic: runtime error: invalid memory address or nil pointer dereference^ (#204)
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/cloudfoundry-community/firehose-to-syslog/issues/204#issuecomment-424924610, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AWJsOlKB8vgYEelAo7iCLGvSp0SXW6IQks5ufCoogaJpZM4W7Lpu.
STATEMENT OF CONFIDENTIALITY The information contained in this email message and any attachments may be confidential and legally privileged and is intended for the use of the addressee(s) only. If you are not an intended recipient, please: (1) notify me immediately by replying to this message; (2) do not use, disseminate, distribute or reproduce any part of the message or any attachment; and (3) destroy all copies of this message and any attachments.
I will propose a fix soon. But the problem is I don't know how you get this issue.
So I can reproduce in my env.
Thanks.
We are running PCF 2.1
The following was done
go get github.com/cloudfoundry-community/firehose-to-syslog cd $GOPATH/src/github.com/cloudfoundry-community/firehose-to-syslog
Modified
func ValueMetric(msg *events.Envelope) *Event {
valMetric := msg.GetValueMetric()
fields := logrus.Fields{
"unit": valMetric.GetUnit(),
valMetric.GetName(): valMetric.GetValue(),
}
return &Event{
Fields: fields,
Msg: "",
}
}
go build
Pushed the app with the new firehose-to-syslog binary. Wait a bit and you get that memory error.
As a side note are there any thoughts of adding filtering with the ORG: org1, org2 option? For example ABC-123, ABC-231 would be ABC* instead of writing both orgs in the filter list.
David Filippelli | Platform Engineer, Platform Reliability Enterprise Technology & Services 250 Bloor Street East | Toronto, Ontario, M4W 1E5 | M. 416-527-3343 David_Filippelli@manulife.commailto:David_Filippelli@manulife.com | manulife.comhttp://www.manulife.com/
Have a question? Want to know how we deliver? Check out our Playbookhttps://mfc.sharepoint.com/sites/GEO/playbook and be a part of the community!
[cid:image001.png@01D37BEB.A96D7410]
From: Etourneau Gwenn notifications@github.com Reply-To: cloudfoundry-community/firehose-to-syslog reply@reply.github.com Date: Wednesday, September 26, 2018 at 9:51 PM To: cloudfoundry-community/firehose-to-syslog firehose-to-syslog@noreply.github.com Cc: David Filippelli David_Filippelli@manulife.com, Author author@noreply.github.com Subject: Re: [cloudfoundry-community/firehose-to-syslog] ERR panic: runtime error: invalid memory address or nil pointer dereference^ (#204)
I will propose a fix soon. But the problem is I don't know how you get this issue.
So I can reproduce in my env.
Thanks.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/cloudfoundry-community/firehose-to-syslog/issues/204#issuecomment-424929810, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AWJsOjNC2Z-vzKjH0NdwJpww_P7Bl8m7ks5ufC8KgaJpZM4W7Lpu.
STATEMENT OF CONFIDENTIALITY The information contained in this email message and any attachments may be confidential and legally privileged and is intended for the use of the addressee(s) only. If you are not an intended recipient, please: (1) notify me immediately by replying to this message; (2) do not use, disseminate, distribute or reproduce any part of the message or any attachment; and (3) destroy all copies of this message and any attachments.
Can you checkout the branch fix/nil_error
and try in your env.
Thanks
@shinji62 - that might fix the symptom, but I don't think helps with the cause. e.g. I can't understand why https://github.com/cloudfoundry-community/firehose-to-syslog/blob/develop/caching/caching_lazyfill.go#L180 would ever see a nil
r.Entity
. A caller shouldn't need to check for a nil
, it simply shouldn't be returning one.
ie why would the API ever return a GUID for an app, but no entity record? I'd to see the full response that came back from the server....
@aeijdenberg I don't know neither ... I can't reproduce.. I just provided a quick / dirty fix.
It's not a fix though - it just masks what could be a lurking and legitimate problem on either the server, or our understanding of how the API call is meant to work.
Maybe could try something like this first: https://github.com/cloudfoundry-community/firehose-to-syslog/pull/205
yeah quick and dirty :) booohhh
Hi, I'm a colleague of @filidav who's been advising on this issue. I recommend a couple of things for this:
Do the nil check in both places. For FilleCache()
this fixes that specific nil issue. Including if nz == nil {...}
in the normaliseAndSaveEntityToDatabase(...)
will future proof this for any other callers trying to do this.
I'd also recommend updating the fix/nil_error
patch to look like:
// Populate bolt with all apps
for guid, app := range allApps {
// escape early to avoid unnecessary allocations
if app == nil {
continue
}
// Canonicalise the guid
u, err := uuid.FromString(guid)
if err != nil {
return err
}
uuid := u.String()
// ...
}
It is counter-productive to check for nil
and then continue
. Please don't do that.
Why is that a bad idea? Because it's masking a different problem - these should never be nil
to begin with - that they are indicates a bug or API misunderstanding elsewhere.
I think it's reasonable to check and raise an error that can help diagnose the issue, but to check and continue as though there's no issue is worse than crashing.
I would be interested to see what log messages result if you merge https://github.com/cloudfoundry-community/firehose-to-syslog/pull/205/. That should print the URL which you could curl
to look at the response and see if we can figure out why key elements of the JSON aren't being populated that we expect to be.
If I do test this will ever be merged into master?
I think if you test it, then you'd be helping us figure out what the actual bug is and where it might lie.
Sounds good I can assume there have no changes to this branch since yesterday?
It is counter-productive to check for
nil
and thencontinue
. Please don't do that.Why is that a bad idea? Because it's masking a different problem - these should never be
nil
to begin with - that they are indicates a bug or API misunderstanding elsewhere.I think it's reasonable to check and raise an error that can help diagnose the issue, but to check and continue as though there's no issue is worse than crashing.
I would be interested to see what log messages result if you merge #205. That should print the URL which you could
curl
to look at the response and see if we can figure out why key elements of the JSON aren't being populated that we expect to be.
I agree that "hiding" this issue with a continue
is bad practice depending on the situation. In this case, it is probably not good to continue with only part of the data.
Deployed the firehose nozzle app with this branch Error is back
2018-09-28T08:03:13.561-04:00 [APP/PROC/WEB/1] [OUT] [2018-09-28 12:03:13.561663524 +0000 UTC m=+0.050649701] Using wss://doppler.sys.use.sandbox.pcf.manulife.com:443 as doppler endpoint 2018-09-28T08:03:13.561-04:00 [APP/PROC/WEB/1] [OUT] [2018-09-28 12:03:13.561752924 +0000 UTC m=+0.050739801] Pre-filling cache... 2018-09-28T08:03:18.011-04:00 [APP/PROC/WEB/0] [OUT] [2018-09-28 12:03:18.011354613 +0000 UTC m=+4.617422301] Cache filled. 2018-09-28T08:03:18.115-04:00 [APP/PROC/WEB/0] [OUT] [2018-09-28 12:03:18.115699613 +0000 UTC m=+4.721767701] Connected to Syslog Server! Connecting to Firehose... 2018-09-28T08:03:18.175-04:00 [APP/PROC/WEB/1] [OUT] [2018-09-28 12:03:18.175488127 +0000 UTC m=+4.664474004] Cache filled. 2018-09-28T08:03:18.256-04:00 [APP/PROC/WEB/1] [OUT] [2018-09-28 12:03:18.256154427 +0000 UTC m=+4.745140904] Connected to Syslog Server! Connecting to Firehose... 2018-09-28T09:21:33.918-04:00 [APP/PROC/WEB/0] [ERR] panic: runtime error: invalid memory address or nil pointer dereference 2018-09-28T09:21:33.918-04:00 [APP/PROC/WEB/0] [ERR] [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x78dae4] 2018-09-28T09:21:33.918-04:00 [APP/PROC/WEB/0] [ERR] goroutine 66 [running]: 2018-09-28T09:21:33.918-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).normaliseAndSaveEntityToDatabase(0xc0001751a0, 0x9e481d, 0x4, 0xc00022e420, 0x24, 0x0, 0x0, 0x0) 2018-09-28T09:21:33.918-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:148 +0x84 2018-09-28T09:21:33.918-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).getEntity(0xc0001751a0, 0x9e481d, 0x4, 0xc00022e390, 0x24, 0xc0002efcf0, 0xc0002efcf0, 0x4c3540) 2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:124 +0x2b8 2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).GetApp(0xc0001751a0, 0xc00022e390, 0x24, 0x1, 0x1, 0xc00022e390) 2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:245 +0x8a 2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/events.(Event).AnnotateWithAppData(0xc00022ba10, 0xa82e80, 0xc0001751a0) 2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/events/events.go:129 +0x129 2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/eventRouting.(EventRoutingDefault).RouteEvent(0xc000236ff0, 0xc0001d0b40) 2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/eventRouting/eventrouting_default.go:68 +0x34d 2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient.(FirehoseNozzle).ReadLogsBuffer(0xc000400150, 0xa86be0, 0xc0004e0300) 2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient/firehoseclient.go:119 +0x189 2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] created by git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient.(*FirehoseNozzle).Start 2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient/firehoseclient.go:77 +0xb4 2018-09-28T09:21:33.930-04:00 [APP/PROC/WEB/0] [OUT] Exit status 2 2018-09-28T09:21:33.933-04:00 [CELL/SSHD/0] [OUT] Exit status 0 2018-09-28T09:21:33.951-04:00 [API/1] [OUT] Process has crashed with type: "web" 2018-09-28T09:21:33.965-04:00 [API/1] [OUT] App instance exited with guid 3b8e29d9-b161-4c14-b9fc-21db53e63e25 payload: {"instance"=>"66ce6fbf-705b-44ff-43d3-54c2", "index"=>0, "reason"=>"CRASHED", "exit_description"=>"APP/PROC/WEB: Exited with status 2", "crash_count"=>1, "crash_timestamp"=>1538140893928354200, "version"=>"dd24d491-efea-442c-9563-4b5f48567d03"} 2018-09-28T09:21:33.972-04:00 [CELL/0] [OUT] Cell db1be119-8594-481d-92bc-59eb0dc30522 stopping instance 66ce6fbf-705b-44ff-43d3-54c2 2018-09-28T09:21:33.972-04:00 [CELL/0] [OUT] Cell db1be119-8594-481d-92bc-59eb0dc30522 destroying container for instance 66ce6fbf-705b-44ff-43d3-54c2 2018-09-28T09:21:34.005-04:00 [CELL/0] [OUT] Cell 8e12ecd0-fb5c-4cde-acda-a3df0326c641 creating container for instance 36edab2d-ac3b-4660-7dd4-3d26 2018-09-28T09:21:34.231-04:00 [CELL/0] [OUT] Cell db1be119-8594-481d-92bc-59eb0dc30522 successfully destroyed container for instance 66ce6fbf-705b-44ff-43d3-54c2 2018-09-28T09:21:34.384-04:00 [CELL/0] [OUT] Cell 8e12ecd0-fb5c-4cde-acda-a3df0326c641 successfully created container for instance 36edab2d-ac3b-4660-7dd4-3d26 2018-09-28T09:21:35.353-04:00 [APP/PROC/WEB/0] [OUT] [2018-09-28 13:21:35.351654476 +0000 UTC m=+0.005510501] Starting firehose-to-syslog 0.0.0 2018-09-28T09:21:35.381-04:00 [APP/PROC/WEB/0] [OUT] wss://doppler.sys.use.sandbox.pcf.manulife.com:443
What else do you require?
changed manifest to debug : true
Not much more detail
2018-09-28T12:06:33.72-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"p-mysql-8f4fb8de90f6a3757e58","event_type":"ValueMetric","ip":"","job":"monitoring","job_index":"c92ca7a8-a416-445d-9af2-a51f72ea816b","level":"info","msg":"","origin":"bosh-system-metrics-forwarder","time":"2018-09-28T16:06:33Z","unit":"Percent","valMetric.GetName":0} 2018-09-28T12:06:33.72-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"p-mysql-8f4fb8de90f6a3757e58","event_type":"ValueMetric","ip":"","job":"monitoring","job_index":"c92ca7a8-a416-445d-9af2-a51f72ea816b","level":"info","msg":"","origin":"bosh-system-metrics-forwarder","time":"2018-09-28T16:06:33Z","unit":"Load","valMetric.GetName":0.3} 2018-09-28T12:06:33.74-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.89","job":"cloud_controller","job_index":"16bad1d6-4d25-40b8-8116-32809fb17d70","level":"info","msg":"","origin":"cc","time":"2018-09-28T16:06:33Z","unit":"counter","valMetric.GetName":-1} 2018-09-28T12:06:33.74-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.89","job":"cloud_controller","job_index":"16bad1d6-4d25-40b8-8116-32809fb17d70","level":"info","msg":"","origin":"cc","time":"2018-09-28T16:06:33Z","unit":"counter","valMetric.GetName":1263625} 2018-09-28T12:06:33.74-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.89","job":"cloud_controller","job_index":"16bad1d6-4d25-40b8-8116-32809fb17d70","level":"info","msg":"","origin":"cc","time":"2018-09-28T16:06:33Z","unit":"counter","valMetric.GetName":-1} 2018-09-28T12:06:33.74-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.30","job":"router","job_index":"f5c4821a-a54c-47d1-b548-ec33d47d3491","level":"info","msg":"","origin":"gorouter","time":"2018-09-28T16:06:33Z","unit":"ns","valMetric.GetName":3100} 2018-09-28T12:06:33.74-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.74","job":"router","job_index":"c2d0977d-7eb6-46ca-a37a-ba01b9f150bf","level":"info","msg":"","origin":"gorouter","time":"2018-09-28T16:06:33Z","unit":"seconds","valMetric.GetName":129496} 2018-09-28T12:06:33.74-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.89","job":"cloud_controller","job_index":"16bad1d6-4d25-40b8-8116-32809fb17d70","level":"info","msg":"","origin":"cc","time":"2018-09-28T16:06:33Z","unit":"counter","valMetric.GetName":0} 2018-09-28T12:06:33.74-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.89","job":"cloud_controller","job_index":"16bad1d6-4d25-40b8-8116-32809fb17d70","level":"info","msg":"","origin":"cc","time":"2018-09-28T16:06:33Z","unit":"counter","valMetric.GetName":1263626} 2018-09-28T12:06:33.74-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.89","job":"cloud_controller","job_index":"16bad1d6-4d25-40b8-8116-32809fb17d70","level":"info","msg":"","origin":"cc","time":"2018-09-28T16:06:33Z","unit":"counter","valMetric.GetName":1261013} 2018-09-28T12:06:33.74-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.72","job":"router","job_index":"dcea8fb8-5b51-4510-9873-fd1551c8d912","level":"info","msg":"","origin":"gorouter","time":"2018-09-28T16:06:33Z","unit":"ns","valMetric.GetName":3400} ^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR panic: runtime error: invalid memory address or nil pointer dereference^[[0m ^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x78dae4]^[[0m ^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR goroutine 67 [running]:^[[0m ^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).normaliseAndSaveEntityToDatabase(0xc00010e600, 0x9e481d, 0x4, 0xc0004e6120, 0x24, 0x0, 0x0, 0x0)^[[0m ^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:148 +0x84^[[0m ^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).getEntity(0xc00010e600, 0x9e481d, 0x4, 0xc0004e60c0, 0x24, 0xc00030bcf0, 0xc00030bcf0, 0x4c3540)^[[0m ^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:124 +0x2b8^[[0m ^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).GetApp(0xc00010e600, 0xc0004e60c0, 0x24, 0x1, 0x1, 0xc0004e60c0)^[[0m ^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:245 +0x8a^[[0m ^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR git.platform.manulife.io/platformreliability/firehose-to-syslog/events.(Event).AnnotateWithAppData(0xc000242c90, 0xa82e80, 0xc00010e600)^[[0m ^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/events/events.go:129 +0x129^[[0m 2018-09-28T12:06:33.82-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.29","job":"cloud_controller","job_index":"99244cb7-e921-44e7-b6be-e67b9cc0782b","level":"info","msg":"","origin":"cc","time":"2018-09-28T16:06:33Z","unit":"counter","valMetric.GetName":0} 2018-09-28T12:06:33.82-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.73","job":"router","job_index":"79a95419-2bf9-4f69-be41-da1da70ffcc3","level":"info","msg":"","origin":"gorouter","time":"2018-09-28T16:06:33Z","unit":"ms","valMetric.GetName":23} 2018-09-28T12:06:33.82-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.73","job":"router","job_index":"79a95419-2bf9-4f69-be41-da1da70ffcc3","level":"info","msg":"","origin":"gorouter","time":"2018-09-28T16:06:33Z","unit":"ms","valMetric.GetName":39} 2018-09-28T12:06:33.89-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.74","job":"router","job_index":"c2d0977d-7eb6-46ca-a37a-ba01b9f150bf","level":"info","msg":"","origin":"gorouter","time":"2018-09-28T16:06:33Z","unit":"ms","valMetric.GetName":19} 2018-09-28T12:06:33.89-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.73","job":"router","job_index":"79a95419-2bf9-4f69-be41-da1da70ffcc3","level":"info","msg":"","origin":"gorouter","time":"2018-09-28T16:06:33Z","unit":"ns","valMetric.GetName":4300} 2018-09-28T12:06:33.89-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.54","job":"diego_cell","job_index":"f9db5322-ba4d-407f-8c66-b99f143a3b19","level":"info","msg":"","origin":"netmon","time":"2018-09-28T16:06:33Z","unit":"count","valMetric.GetName":1368552} 2018-09-28T12:06:33.89-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.54","job":"diego_cell","job_index":"f9db5322-ba4d-407f-8c66-b99f143a3b19","level":"info","msg":"","origin":"netmon","time":"2018-09-28T16:06:33Z","unit":"count","valMetric.GetName":4} 2018-09-28T12:06:33.89-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.54","job":"diego_cell","job_index":"f9db5322-ba4d-407f-8c66-b99f143a3b19","level":"info","msg":"","origin":"netmon","time":"2018-09-28T16:06:33Z","unit":"count","valMetric.GetName":3601817} 2018-09-28T12:06:33.89-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.73","job":"router","job_index":"79a95419-2bf9-4f69-be41-da1da70ffcc3","level":"info","msg":"","origin":"gorouter","time":"2018-09-28T16:06:33Z","unit":"ms","valMetric.GetName":23} 2018-09-28T12:06:33.89-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.73","job":"router","job_index":"79a95419-2bf9-4f69-be41-da1da70ffcc3","level":"info","msg":"","origin":"gorouter","time":"2018-09-28T16:06:33Z","unit":"ms","valMetric.GetName":39} 2018-09-28T12:06:33.89-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.30","job":"router","job_index":"f5c4821a-a54c-47d1-b548-ec33d47d3491","level":"info","msg":"","origin":"gorouter","time":"2018-09-28T16:06:33Z","unit":"ms","valMetric.GetName":147}
I thought with this branch you put in a fix that would avoid this issue. Seem like the issue is still present.
I need to merge #205 and then test again.
@filidav Please let me know the status of your test.
You will need to merge #205 to get some error message. From that we will be able to understand a bit better what is going on.
I merged pull #205
018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] panic: runtime error: invalid memory address or nil pointer dereference 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x78dae4] 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] goroutine 28 [running]: 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).normaliseAndSaveEntityToDatabase(0xc000150300, 0x9e481d, 0x4, 0xc0000c6090, 0x24, 0x0, 0x0, 0x0) 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:145 +0x84 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).getEntity(0xc000150300, 0x9e481d, 0x4, 0xc0000c6030, 0x24, 0xc000605d28, 0xc000605d28, 0x4c3540) 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:121 +0x2b8 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).GetApp(0xc000150300, 0xc0000c6030, 0x24, 0x1, 0x1, 0xc0000c6030) 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:245 +0x8a 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/events.(Event).AnnotateWithAppData(0xc0002722a0, 0xa82ec0, 0xc000150300) 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/events/events.go:129 +0x129 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/eventRouting.(EventRoutingDefault).RouteEvent(0xc000294eb0, 0xc0001127e0) 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/eventRouting/eventrouting_default.go:68 +0x14e 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient.(FirehoseNozzle).ReadLogsBuffer(0xc00023c0e0, 0xa86c20, 0xc0002f61c0) 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient/firehoseclient.go:119 +0x189 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] created by git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient.(*FirehoseNozzle).Start 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient/firehoseclient.go:77 +0xb4 2018-10-01T11:46:34.217-04:00 [APP/PROC/WEB/1] [OUT] Exit status 2
Does it help ^^ ??
Not really you don't have some logs before this one ?
Started the app back up to get you the logs before. Do you want me to put the app in debug mode?
Here is the manifest for you reference.
applications:
name: elk-cloudfoundry-nozzle
health-check-type: process
no-route: true
instances: 2
buildpack: binary_buildpack
path: ../target/fs
command: ./firehose-to-syslog
env:
GOPACKAGENAME: git.platform.manulife.io/platformreliability/firehose-to-syslog
API_ENDPOINT: https://api.sys.use.sandbox.pcf.manulife.com
DEBUG: false
DOPPLER_ENDPOINT: wss://doppler.sys.use.sandbox.pcf.manulife.com:443
EVENTS: LogMessage,ValueMetric,Error,ContainerMetric
FIREHOSE_CLIENT_ID:
FIREHOSE_CLIENT_SECRET:
FIREHOSE_SUBSCRIPTION_ID: firehose-b
LOG_EVENT_TOTALS: true
LOG_EVENT_TOTALS_TIME: 10s
SKIP_SSL_VALIDATION: true
SKIP_SSL_VALIDATION_SYSLOG: false
SYSLOG_ENDPOINT:
SYSLOG_PROTOCOL: tcp+tls # tcp/udp/tcp+tls
EXTRA_FIELDS: "ENV:pcf_sandbox"
LOGS_BUFFER_SIZE: 100000
CF_PULL_TIME: 2s
CERT_PEM: certs/
ORGS: SANDBOX
David Filippelli | Platform Engineer, Platform Reliability Enterprise Technology & Services 250 Bloor Street East | Toronto, Ontario, M4W 1E5 | M. 416-527-3343 David_Filippelli@manulife.commailto:David_Filippelli@manulife.com | manulife.comhttp://www.manulife.com/
Have a question? Want to know how we deliver? Check out our Playbookhttps://mfc.sharepoint.com/sites/GEO/playbook and be a part of the community!
[cid:image001.png@01D37BEB.A96D7410]
From: Etourneau Gwenn notifications@github.com Reply-To: cloudfoundry-community/firehose-to-syslog reply@reply.github.com Date: Monday, October 1, 2018 at 7:50 PM To: cloudfoundry-community/firehose-to-syslog firehose-to-syslog@noreply.github.com Cc: David Filippelli David_Filippelli@manulife.com, Mention mention@noreply.github.com Subject: Re: [cloudfoundry-community/firehose-to-syslog] ERR panic: runtime error: invalid memory address or nil pointer dereference^ (#204)
Not really you don't have some logs before this one ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cloudfoundry-community/firehose-to-syslog/issues/204#issuecomment-426100670, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AWJsOtp7VmNpz6i47WnKjmrStxxXmwKPks5ugqongaJpZM4W7Lpu.
STATEMENT OF CONFIDENTIALITY The information contained in this email message and any attachments may be confidential and legally privileged and is intended for the use of the addressee(s) only. If you are not an intended recipient, please: (1) notify me immediately by replying to this message; (2) do not use, disseminate, distribute or reproduce any part of the message or any attachment; and (3) destroy all copies of this message and any attachments.
Nothing additional you can use.
Just the connection before and the destroy/recreate of the container
2018-10-01T20:43:27.221-04:00 [API/1] [OUT] Starting app with guid 4cc5493a-8559-4d28-a489-cf6d972d348f 2018-10-01T20:43:27.393-04:00 [CELL/0] [OUT] Cell 04c9251e-4555-4f96-8553-d66d852316ff creating container for instance 737c769b-7519-41dd-531e-1111 2018-10-01T20:43:27.410-04:00 [CELL/1] [OUT] Cell db1be119-8594-481d-92bc-59eb0dc30522 creating container for instance 1b74d5f1-a4ed-4d50-452c-2d4c 2018-10-01T20:43:27.722-04:00 [CELL/0] [OUT] Cell 04c9251e-4555-4f96-8553-d66d852316ff successfully created container for instance 737c769b-7519-41dd-531e-1111 2018-10-01T20:43:28.094-04:00 [CELL/1] [OUT] Cell db1be119-8594-481d-92bc-59eb0dc30522 successfully created container for instance 1b74d5f1-a4ed-4d50-452c-2d4c 2018-10-01T20:43:28.517-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-02 00:43:28.516795289 +0000 UTC m=+0.006308301] Starting firehose-to-syslog 0.0.0 2018-10-01T20:43:28.544-04:00 [APP/PROC/WEB/0] [OUT] wss://doppler.sys.use.sandbox.pcf.manulife.com:443 2018-10-01T20:43:28.544-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-02 00:43:28.544562889 +0000 UTC m=+0.034074801] Using wss://doppler.sys.use.sandbox.pcf.manulife.com:443 as doppler endpoint 2018-10-01T20:43:28.544-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-02 00:43:28.544708989 +0000 UTC m=+0.034221101] Pre-filling cache... 2018-10-01T20:43:29.190-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 00:43:29.187993954 +0000 UTC m=+0.008754101] Starting firehose-to-syslog 0.0.0 2018-10-01T20:43:29.244-04:00 [APP/PROC/WEB/1] [OUT] wss://doppler.sys.use.sandbox.pcf.manulife.com:443 2018-10-01T20:43:29.244-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 00:43:29.243482054 +0000 UTC m=+0.064241201] Using wss://doppler.sys.use.sandbox.pcf.manulife.com:443 as doppler endpoint 2018-10-01T20:43:29.244-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 00:43:29.243571554 +0000 UTC m=+0.064330801] Pre-filling cache... 2018-10-01T20:43:32.903-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-02 00:43:32.903195291 +0000 UTC m=+4.392707003] Cache filled. 2018-10-01T20:43:32.984-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-02 00:43:32.984479391 +0000 UTC m=+4.473991503] Connected to Syslog Server! Connecting to Firehose... 2018-10-01T20:43:33.576-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 00:43:33.575924054 +0000 UTC m=+4.396682601] Cache filled. 2018-10-01T20:43:33.665-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 00:43:33.665286754 +0000 UTC m=+4.486046001] Connected to Syslog Server! Connecting to Firehose... 2018-10-01T21:56:40.135-04:00 [APP/PROC/WEB/1] [ERR] panic: runtime error: invalid memory address or nil pointer dereference 2018-10-01T21:56:40.135-04:00 [APP/PROC/WEB/1] [ERR] [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x78dae4] 2018-10-01T21:56:40.136-04:00 [APP/PROC/WEB/1] [ERR] goroutine 41 [running]: 2018-10-01T21:56:40.137-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).normaliseAndSaveEntityToDatabase(0xc000101b30, 0x9e481d, 0x4, 0xc000254390, 0x24, 0x0, 0x0, 0x0) 2018-10-01T21:56:40.137-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:145 +0x84 2018-10-01T21:56:40.138-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).getEntity(0xc000101b30, 0x9e481d, 0x4, 0xc0002541e0, 0x24, 0xc00034dd28, 0xc00034dd28, 0x4c3540) 2018-10-01T21:56:40.138-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:121 +0x2b8 2018-10-01T21:56:40.139-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).GetApp(0xc000101b30, 0xc0002541e0, 0x24, 0x1, 0x1, 0xc0002541e0) 2018-10-01T21:56:40.139-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:245 +0x8a 2018-10-01T21:56:40.140-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/events.(Event).AnnotateWithAppData(0xc0001439b0, 0xa82ec0, 0xc000101b30) 2018-10-01T21:56:40.140-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/events/events.go:129 +0x129 2018-10-01T21:56:40.140-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/eventRouting.(EventRoutingDefault).RouteEvent(0xc00018c140, 0xc00029f320) 2018-10-01T21:56:40.141-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/eventRouting/eventrouting_default.go:68 +0x14e 2018-10-01T21:56:40.141-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient.(FirehoseNozzle).ReadLogsBuffer(0xc0000982a0, 0xa86c20, 0xc00020a0c0) 2018-10-01T21:56:40.141-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient/firehoseclient.go:119 +0x189 2018-10-01T21:56:40.141-04:00 [APP/PROC/WEB/1] [ERR] created by git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient.(*FirehoseNozzle).Start 2018-10-01T21:56:40.142-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient/firehoseclient.go:77 +0xb4 2018-10-01T21:56:40.165-04:00 [APP/PROC/WEB/1] [OUT] Exit status 2 2018-10-01T21:56:40.167-04:00 [CELL/SSHD/1] [OUT] Exit status 0 2018-10-01T21:56:40.201-04:00 [API/0] [OUT] Process has crashed with type: "web" 2018-10-01T21:56:40.205-04:00 [CELL/1] [OUT] Cell db1be119-8594-481d-92bc-59eb0dc30522 stopping instance 1b74d5f1-a4ed-4d50-452c-2d4c 2018-10-01T21:56:40.205-04:00 [CELL/1] [OUT] Cell db1be119-8594-481d-92bc-59eb0dc30522 destroying container for instance 1b74d5f1-a4ed-4d50-452c-2d4c 2018-10-01T21:56:40.215-04:00 [API/0] [OUT] App instance exited with guid 4cc5493a-8559-4d28-a489-cf6d972d348f payload: {"instance"=>"1b74d5f1-a4ed-4d50-452c-2d4c", "index"=>1, "reason"=>"CRASHED", "exit_description"=>"APP/PROC/WEB: Exited with status 2", "crash_count"=>1, "crash_timestamp"=>1538445400183506084, "version"=>"32f2bc3c-2fc2-4b7f-a5d4-76e9de73d352"} 2018-10-01T21:56:40.248-04:00 [CELL/1] [OUT] Cell 8e12ecd0-fb5c-4cde-acda-a3df0326c641 creating container for instance 885b4b0c-2fcd-40f0-533f-bba0 2018-10-01T21:56:40.554-04:00 [CELL/1] [OUT] Cell db1be119-8594-481d-92bc-59eb0dc30522 successfully destroyed container for instance 1b74d5f1-a4ed-4d50-452c-2d4c 2018-10-01T21:56:40.609-04:00 [CELL/1] [OUT] Cell 8e12ecd0-fb5c-4cde-acda-a3df0326c641 successfully created container for instance 885b4b0c-2fcd-40f0-533f-bba0 2018-10-01T21:56:41.568-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 01:56:41.568107473 +0000 UTC m=+0.003780401] Starting firehose-to-syslog 0.0.0 2018-10-01T21:56:41.597-04:00 [APP/PROC/WEB/1] [OUT] wss://doppler.sys.use.sandbox.pcf.manulife.com:443 2018-10-01T21:56:41.597-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 01:56:41.596951573 +0000 UTC m=+0.032622901] Using wss://doppler.sys.use.sandbox.pcf.manulife.com:443 as doppler endpoint 2018-10-01T21:56:41.597-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 01:56:41.597034973 +0000 UTC m=+0.032707001] Pre-filling cache... 2018-10-01T21:56:45.952-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 01:56:45.952673773 +0000 UTC m=+4.388345001] Cache filled. 2018-10-01T21:56:46.033-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 01:56:46.033287973 +0000 UTC m=+4.468959701] Connected to Syslog Server! Connecting to Firehose...
I thought I would be able to see a URL that I could curl that is emitting the nil pointer. I don't see any URL
failed again
2018-10-01T21:56:40.248-04:00 [CELL/1] [OUT] Cell 8e12ecd0-fb5c-4cde-acda-a3df0326c641 creating container for instance 885b4b0c-2fcd-40f0-533f-bba0 2018-10-01T21:56:40.554-04:00 [CELL/1] [OUT] Cell db1be119-8594-481d-92bc-59eb0dc30522 successfully destroyed container for instance 1b74d5f1-a4ed-4d50-452c-2d4c 2018-10-01T21:56:40.609-04:00 [CELL/1] [OUT] Cell 8e12ecd0-fb5c-4cde-acda-a3df0326c641 successfully created container for instance 885b4b0c-2fcd-40f0-533f-bba0 2018-10-01T21:56:41.568-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 01:56:41.568107473 +0000 UTC m=+0.003780401] Starting firehose-to-syslog 0.0.0 2018-10-01T21:56:41.597-04:00 [APP/PROC/WEB/1] [OUT] wss://doppler.sys.use.sandbox.pcf.manulife.com:443 2018-10-01T21:56:41.597-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 01:56:41.596951573 +0000 UTC m=+0.032622901] Using wss://doppler.sys.use.sandbox.pcf.manulife.com:443 as doppler endpoint 2018-10-01T21:56:41.597-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 01:56:41.597034973 +0000 UTC m=+0.032707001] Pre-filling cache... 2018-10-01T21:56:45.952-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 01:56:45.952673773 +0000 UTC m=+4.388345001] Cache filled. 2018-10-01T21:56:46.033-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 01:56:46.033287973 +0000 UTC m=+4.468959701] Connected to Syslog Server! Connecting to Firehose... 2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] panic: runtime error: invalid memory address or nil pointer dereference 2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x78dae4] 2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] goroutine 40 [running]: 2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).normaliseAndSaveEntityToDatabase(0xc000128480, 0x9e481d, 0x4, 0xc000267500, 0x24, 0x0, 0x0, 0x0) 2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:145 +0x84 2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).getEntity(0xc000128480, 0x9e481d, 0x4, 0xc0002674a0, 0x24, 0xc000233d28, 0xc000233d28, 0x4c3540) 2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:121 +0x2b8 2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).GetApp(0xc000128480, 0xc0002674a0, 0x24, 0x1, 0x1, 0xc0002674a0) 2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:245 +0x8a 2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/events.(Event).AnnotateWithAppData(0xc00047be00, 0xa82ec0, 0xc000128480) 2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/events/events.go:129 +0x129 2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/eventRouting.(EventRoutingDefault).RouteEvent(0xc00008ac80, 0xc0000f0c60) 2018-10-01T22:16:33.819-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/eventRouting/eventrouting_default.go:68 +0x14e 2018-10-01T22:16:33.819-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient.(FirehoseNozzle).ReadLogsBuffer(0xc0001f3b90, 0xa86c20, 0xc00014db00) 2018-10-01T22:16:33.819-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient/firehoseclient.go:119 +0x189 2018-10-01T22:16:33.819-04:00 [APP/PROC/WEB/0] [ERR] created by git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient.(*FirehoseNozzle).Start 2018-10-01T22:16:33.819-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient/firehoseclient.go:77 +0xb4 2018-10-01T22:16:33.831-04:00 [APP/PROC/WEB/0] [OUT] Exit status 2 2018-10-01T22:16:33.834-04:00 [CELL/SSHD/0] [OUT] Exit status 0 2018-10-01T22:16:33.855-04:00 [API/1] [OUT] Process has crashed with type: "web" 2018-10-01T22:16:33.871-04:00 [API/1] [OUT] App instance exited with guid 4cc5493a-8559-4d28-a489-cf6d972d348f payload: {"instance"=>"737c769b-7519-41dd-531e-1111", "index"=>0, "reason"=>"CRASHED", "exit_description"=>"APP/PROC/WEB: Exited with status 2", "crash_count"=>1, "crash_timestamp"=>1538446593828050115, "version"=>"32f2bc3c-2fc2-4b7f-a5d4-76e9de73d352"} 2018-10-01T22:16:33.872-04:00 [CELL/0] [OUT] Cell 04c9251e-4555-4f96-8553-d66d852316ff stopping instance 737c769b-7519-41dd-531e-1111 2018-10-01T22:16:33.872-04:00 [CELL/0] [OUT] Cell 04c9251e-4555-4f96-8553-d66d852316ff destroying container for instance 737c769b-7519-41dd-531e-1111 2018-10-01T22:16:33.900-04:00 [CELL/0] [OUT] Cell f8c6141c-d989-4040-b27c-e891461b69e4 creating container for instance da5b65f7-9dc5-4900-51fd-841c 2018-10-01T22:16:34.219-04:00 [CELL/0] [OUT] Cell 04c9251e-4555-4f96-8553-d66d852316ff successfully destroyed container for instance 737c769b-7519-41dd-531e-1111 2018-10-01T22:16:34.369-04:00 [CELL/0] [OUT] Cell f8c6141c-d989-4040-b27c-e891461b69e4 successfully created container for instance da5b65f7-9dc5-4900-51fd-841c 2018-10-01T22:16:35.489-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-02 02:16:35.488065621 +0000 UTC m=+0.006920801] Starting firehose-to-syslog 0.0.0 2018-10-01T22:16:35.534-04:00 [APP/PROC/WEB/0] [OUT] wss://doppler.sys.use.sandbox.pcf.manulife.com:443 2018-10-01T22:16:35.534-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-02 02:16:35.534180321 +0000 UTC m=+0.053035001] Using wss://doppler.sys.use.sandbox.pcf.manulife.com:443 as doppler endpoint 2018-10-01T22:16:35.534-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-02 02:16:35.534245521 +0000 UTC m=+0.053100301] Pre-filling cache... 2018-10-01T22:16:39.800-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-02 02:16:39.800264524 +0000 UTC m=+4.319119004] Cache filled.
Can you try with debug mode enabled ?
With debug
"job":"cloud_controller","job_index":"16bad1d6-4d25-40b8-8116-32809fb17d70","level":"info","msg":"","origin":"cc","requests.outstanding":3,"time":"2018-10-02T03:26:40Z","unit":"counter"} 2018-10-01T23:26:40.14-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf-47abe4dc6b20d063f500","event_type":"ValueMetric","health.check.cliCommand.start":1,"ip":"","job":"healthwatch-forwarder","job_index":"09bc9061-de37-46f1-bdb9-7409a2e9c6b1","level":"info","msg":"","origin":"healthwatch","time":"2018-10-02T03:26:40Z","unit":"Metric"} 2018-10-01T23:26:40.14-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf-47abe4dc6b20d063f500","event_type":"ValueMetric","health.check.cliCommand.logs":1,"ip":"","job":"healthwatch-forwarder","job_index":"09bc9061-de37-46f1-bdb9-7409a2e9c6b1","level":"info","msg":"","origin":"healthwatch","time":"2018-10-02T03:26:40Z","unit":"Metric"} 2018-10-01T23:26:40.14-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf-47abe4dc6b20d063f500","event_type":"ValueMetric","health.check.cliCommand.stop":1,"ip":"","job":"healthwatch-forwarder","job_index":"09bc9061-de37-46f1-bdb9-7409a2e9c6b1","level":"info","msg":"","origin":"healthwatch","time":"2018-10-02T03:26:40Z","unit":"Metric"} 2018-10-01T23:26:40.14-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.74","job":"router","job_index":"c2d0977d-7eb6-46ca-a37a-ba01b9f150bf","latency":13,"level":"info","msg":"","origin":"gorouter","time":"2018-10-02T03:26:40Z","unit":"ms"} 2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.73","job":"router","job_index":"79a95419-2bf9-4f69-be41-da1da70ffcc3","level":"info","msg":"","origin":"gorouter","time":"2018-10-02T03:26:40Z","unit":"seconds","uptime":35804} 2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.37","job":"diego_cell","job_index":"04c9251e-4555-4f96-8553-d66d852316ff","level":"info","msg":"","numGoRoutines":3,"origin":"grootfs","time":"2018-10-02T03:26:40Z","unit":"count"} 2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.37","job":"diego_cell","job_index":"04c9251e-4555-4f96-8553-d66d852316ff","level":"info","msg":"","numCPUS":4,"origin":"grootfs","time":"2018-10-02T03:26:40Z","unit":"count"} 2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.37","job":"diego_cell","job_index":"04c9251e-4555-4f96-8553-d66d852316ff","level":"info","memoryStats.numFrees":1481,"msg":"","origin":"grootfs","time":"2018-10-02T03:26:40Z","unit":"count"} 2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.37","job":"diego_cell","job_index":"04c9251e-4555-4f96-8553-d66d852316ff","level":"info","memoryStats.numBytesAllocatedStack":589824,"msg":"","origin":"grootfs","time":"2018-10-02T03:26:40Z","unit":"count"} 2018-10-01T23:26:39.77-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.72","job":"router","job_index":"dcea8fb8-5b51-4510-9873-fd1551c8d912","level":"info","msg":"","origin":"gorouter","route_lookup_time":3200,"time":"2018-10-02T03:26:39Z","unit":"ns"} 2018-10-01T23:26:39.77-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.74","job":"router","job_index":"c2d0977d-7eb6-46ca-a37a-ba01b9f150bf","latency":41,"level":"info","msg":"","origin":"gorouter","time":"2018-10-02T03:26:39Z","unit":"ms"} 2018-10-01T23:26:39.77-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.74","job":"router","job_index":"c2d0977d-7eb6-46ca-a37a-ba01b9f150bf","latency.CloudController":41,"level":"info","msg":"","origin":"gorouter","time":"2018-10-02T03:26:39Z","unit":"ms"} 2018-10-01T23:26:39.77-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.40","job":"windows_diego_cell","job_index":"7b377737-6019-4661-b95f-829aeed7381b","level":"info","memoryStats.lastGCPauseTimeNS":0,"msg":"","origin":"rep","time":"2018-10-02T03:26:39Z","unit":"ns"} ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR panic: runtime error: invalid memory address or nil pointer dereference^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x78dae4]^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR goroutine 45 [running]:^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).normaliseAndSaveEntityToDatabase(0xc0001323f0, 0x9e481d, 0x4, 0xc00036ec90, 0x24, 0x0, 0x0, 0x0)^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:145 +0x84^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).getEntity(0xc0001323f0, 0x9e481d, 0x4, 0xc00036ec30, 0x24, 0xc0003abd28, 0xc0003abd28, 0x4c3540)^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:121 +0x2b8^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).GetApp(0xc0001323f0, 0xc00036ec30, 0x24, 0x1, 0x1, 0xc00036ec30)^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:245 +0x8a^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR git.platform.manulife.io/platformreliability/firehose-to-syslog/events.(Event).AnnotateWithAppData(0xc00050e390, 0xa82ec0, 0xc0001323f0)^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/events/events.go:129 +0x129^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR git.platform.manulife.io/platformreliability/firehose-to-syslog/eventRouting.(EventRoutingDefault).RouteEvent(0xc00027adc0, 0xc000514bd0)^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/eventRouting/eventrouting_default.go:68 +0x14e^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient.(FirehoseNozzle).ReadLogsBuffer(0xc0003d40e0, 0xa86c20, 0xc0002991c0)^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient/firehoseclient.go:119 +0x189^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR created by git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient.(*FirehoseNozzle).Start^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient/firehoseclient.go:77 +0xb4^[[0m 2018-10-01T23:26:39.81-0400 [APP/PROC/WEB/0] OUT Exit status 2 2018-10-01T23:26:39.82-0400 [CELL/SSHD/0] OUT Exit status 0 2018-10-01T23:26:39.86-0400 [CELL/0] OUT Cell 04c9251e-4555-4f96-8553-d66d852316ff stopping instance 234b126c-dce9-4754-69b6-ae62 2018-10-01T23:26:39.86-0400 [CELL/0] OUT Cell 04c9251e-4555-4f96-8553-d66d852316ff destroying container for instance 234b126c-dce9-4754-69b6-ae62 2018-10-01T23:26:40.15-0400 [CELL/0] OUT Cell 04c9251e-4555-4f96-8553-d66d852316ff successfully destroyed container for instance 234b126c-dce9-4754-69b6-ae62 2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.37","job":"diego_cell","job_index":"04c9251e-4555-4f96-8553-d66d852316ff","level":"info","memoryStats.lastGCPauseTimeNS":0,"msg":"","origin":"grootfs","time":"2018-10-02T03:26:40Z","unit":"count"} 2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","ImageDeletionTime":21269100,"cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.37","job":"diego_cell","job_index":"04c9251e-4555-4f96-8553-d66d852316ff","level":"info","msg":"","origin":"grootfs","time":"2018-10-02T03:26:40Z","unit":"nanos"} 2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.34","job":"diego_database","job_index":"6babf9d0-2623-40ae-943a-50fee1eca036","level":"info","msg":"","origin":"policy-server","time":"2018-10-02T03:26:40Z","totalPolicies":0,"unit":""} 2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.89","job":"cloud_controller","job_index":"16bad1d6-4d25-40b8-8116-32809fb17d70","level":"info","msg":"","origin":"cc","requests.outstanding":2,"time":"2018-10-02T03:26:40Z","unit":"counter"} 2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.89","job":"cloud_controller","job_index":"16bad1d6-4d25-40b8-8116-32809fb17d70","level":"info","msg":"","origin":"cc","requests.completed":2176833,"time":"2018-10-02T03:26:40Z","unit":"counter"} 2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","http_status.4XX":3708,"ip":"10.234.250.89","job":"cloud_controller","job_index":"16bad1d6-4d25-40b8-8116-32809fb17d70","level":"info","msg":"","origin":"cc","time":"2018-10-02T03:26:40Z","unit":"counter"} @ @
Any other suggestions?
@filidav - sorry for the delayed response, have been at an offsite without laptop earlier this week. Looking at the new stack trace, it seems to be hitting a slightly different code path. I've added a second commit to PR #205 to catch this and log an appropriate message.
Would be possible to rebuild with this patch and then look for log messages containing nil entity returned
?
Added your changes
2018-10-03T20:26:06.820-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-04 00:26:06.819826368 +0000 UTC m=+4.128743401] Connected to Syslog Server! Connecting to Firehose... 2018-10-03T20:26:07.648-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-04 00:26:07.648295102 +0000 UTC m=+3.917315101] Cache filled. 2018-10-03T20:26:07.729-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-04 00:26:07.729628602 +0000 UTC m=+3.998649301] Connected to Syslog Server! Connecting to Firehose... 2018-10-03T22:43:39.978-04:00 [APP/PROC/WEB/1] [ERR] [2018-10-04 02:43:39.978537972 +0000 UTC m=+8257.287454505] Exception occurred! Message: Error while reading from the Firehose: Details: websocket: close 1006 (abnormal closure): unexpected EOF 2018-10-03T22:43:39.979-04:00 [APP/PROC/WEB/1] [ERR] [2018-10-04 02:43:39.979025572 +0000 UTC m=+8257.287942105] Exception occurred! Message: Error while reading from the Firehose: Details: websocket: close 1006 (abnormal closure): unexpected EOF
@filidav - I think it's unlikely the change would have introduced that particular error. Are there any instances of nil entity returned
present in the logs?
(if I'm reading the above correctly, is it mixing messages from 2 different processes, WEB/0
and WEB/1
?)
No other errors I noticed someone else reported the same error I just did as well.
https://github.com/cloudfoundry-community/firehose-to-syslog/issues/180
Any other suggestions on how we can trap for this condition Ever since we installed this fix I don't see the nil pointer. Are we now ignoring it?
Why would the API ever return a GUID for an app, but no entity record which would cause this panic error? Anyway we can get closer to this issue and identify how this is happening?
Why would the API ever return a GUID for an app, but no entity record which would cause this panic error? ==> We don't know, that the all problem here.
Ever since we installed this fix I don't see the nil pointer. Are we now ignoring it? ==> We don't ignore it, we trap the error and ignore the record not the error.
So how will we ever get to root cause?
David Filippelli | Platform Engineer, Platform Reliability Enterprise Technology & Services 250 Bloor Street East | Toronto, Ontario, M4W 1E5 | M. 416-527-3343 David_Filippelli@manulife.commailto:David_Filippelli@manulife.com | manulife.comhttp://www.manulife.com/
Have a question? Want to know how we deliver? Check out our Playbookhttps://mfc.sharepoint.com/sites/GEO/playbook and be a part of the community!
[cid:image001.png@01D37BEB.A96D7410]
From: Etourneau Gwenn notifications@github.com Reply-To: cloudfoundry-community/firehose-to-syslog reply@reply.github.com Date: Tuesday, October 9, 2018 at 9:50 PM To: cloudfoundry-community/firehose-to-syslog firehose-to-syslog@noreply.github.com Cc: David Filippelli David_Filippelli@manulife.com, Mention mention@noreply.github.com Subject: Re: [cloudfoundry-community/firehose-to-syslog] ERR panic: runtime error: invalid memory address or nil pointer dereference^ (#204)
Why would the API ever return a GUID for an app, but no entity record which would cause this panic error? ==> We don't know, that the all problem here.
Ever since we installed this fix I don't see the nil pointer. Are we now ignoring it? ==> We don't ignore it, we trap the error and ignore the record not the error.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cloudfoundry-community/firehose-to-syslog/issues/204#issuecomment-428410423, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AWJsOomev1pPgGjG4rSs7fj-n8r3nSLHks5ujVJKgaJpZM4W7Lpu.
STATEMENT OF CONFIDENTIALITY The information contained in this email message and any attachments may be confidential and legally privileged and is intended for the use of the addressee(s) only. If you are not an intended recipient, please: (1) notify me immediately by replying to this message; (2) do not use, disseminate, distribute or reproduce any part of the message or any attachment; and (3) destroy all copies of this message and any attachments.
btw, @aeijdenberg add more change to #205 did you include all of them ?
Yes all those changes are in.
The changes in #205 should stop the panics, but I would have expected that something would be catching and logging the error that is now returned. I'm surprised there isn't, but then again, I was surprised that the underlying API would ever have returned that type of data to begin with, so perhaps it was a transient error and has resolved itself?
Either way, I think #205 should be safe to merge.
Just a side note, it is still technically possible to hit the nil pointer issue when calling normaliseAndSaveEntityToDatabase
. I would suggest checking for nv == nil
and throwing an error to ensure all bases are covered.
@elliottpolk - you're correct that if you pass nil
to that method, there will be a panic. However a quick glance over the code shows this is true of nearly every single function / method definition in this and other Go packages. I think there's a fairly well known convention that one shouldn't pass nil
as an argument to a function / method unless the documentation for that function explicitly calls out that it's OK to do so. I don't see a compelling reason to special case this particular method.
Has this been resolved? I've been seeing this issue (or something very similar) in my PCF environments. Firehose has been crashing on a regular basis. Occasionally, when PCF restarts firehose it fails to start processing the logs again. We've had to set a schedule to automatically restart firehose every couple hours.
2018-09-26T11:07:47.66-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"service-instance_908b922d-8768-4769-b9fd-8e8a383cef11","event_type":"ValueMetric","ip":"","job":"mongod_node","job_index":"fe539b85-6546-4098-ac9f-cd98e4eafcb1","level":"info","msg":"","origin":"bosh-system-metrics-forwarder","system.swap.percent":1,"time":"2018-09-26T15:07:47Z","unit":"Percent"} 2018-09-26T11:07:47.66-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.72","job":"router","job_index":"dcea8fb8-5b51-4510-9873-fd1551c8d912","level":"info","msg":"","origin":"gorouter","route_lookup_time":4900,"time":"2018-09-26T15:07:47Z","unit":"ns"} 2018-09-26T11:07:47.66-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.72","job":"router","job_index":"dcea8fb8-5b51-4510-9873-fd1551c8d912","latency":8,"level":"info","msg":"","origin":"gorouter","time":"2018-09-26T15:07:47Z","unit":"ms"} 2018-09-26T11:07:47.66-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","LeasesRenewRequestTime":47.699400000000004,"cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.34","job":"diego_database","job_index":"6babf9d0-2623-40ae-943a-50fee1eca036","level":"info","msg":"","origin":"silk-controller","time":"2018-09-26T15:07:47Z","unit":"ms"} 2018-09-26T11:07:47.75-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","LeasesIndexRequestTime":10.113199999999999,"cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.34","job":"diego_database","job_index":"6babf9d0-2623-40ae-943a-50fee1eca036","level":"info","msg":"","origin":"silk-controller","time":"2018-09-26T15:07:47Z","unit":"ms"} ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR panic: runtime error: invalid memory address or nil pointer dereference^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x764298]^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR goroutine 27 [running]:^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR git/firehose-to-syslog/caching.(CacheLazyFill).normaliseAndSaveEntityToDatabase(0xc420194480, 0x9b0aa1, 0x4, 0xc4201fc5d0, 0x24, 0x0, 0x0, 0x0)^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR ........../caching/caching_lazyfill.go:145 +0x88^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR git/firehose-to-syslog/caching.(CacheLazyFill).getEntity(0xc420194480, 0x9b0aa1, 0x4, 0xc4201fc390, 0x24, 0xc420231cf0, 0xc420231cf0, 0x4bd94e)^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR .........caching/caching_lazyfill.go:121 +0x2bc^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR git/caching.(CacheLazyFill).GetApp(0xc420194480, 0xc4201fc390, 0x24, 0x1, 0x1, 0xc4201fc390)^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR .........caching/caching_lazyfill.go:242 +0x8a^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR git/events.(Event).AnnotateWithAppData(0xc420414510, 0xa1dca0, 0xc420194480)^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR .........events/events.go:129 +0x12f^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR git./eventRouting.(EventRoutingDefault).RouteEvent(0xc42009b680, 0xc4203242d0)^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR .........eventRouting/eventrouting_default.go:68 +0x14d^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR git/firehoseclient.(FirehoseNozzle).ReadLogsBuffer(0xc4203e28c0, 0xa21960, 0xc4203e44c0)^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR .........firehoseclient/firehoseclient.go:119 +0x1b3^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR created by git/firehoseclient.(*FirehoseNozzle).Start^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR .........firehoseclient/firehoseclient.go:77 +0xb4^[[0m 2018-09-26T11:07:47.80-0400 [APP/PROC/WEB/1] OUT Exit status 2 2018-09-26T11:07:47.80-0400 [CELL/SSHD/1] OUT Exit status 0 2018-09-26T11:07:47.84-0400 [CELL/1] OUT Cell db1be119-8594-481d-92bc-59eb0dc30522 stopping instance 790d8b14-45b7-49aa-6d29-257f 2018-09-26T11:07:47.84-0400 [CELL/1] OUT Cell db1be119-8594-481d-92bc-59eb0dc30522 destroying container for instance 790d8b14-45b7-49aa-6d29-257f 2018-09-26T11:07:48.20-0400 [CELL/1] OUT Cell db1be119-8594-481d-92bc-59eb0dc30522 successfully destroyed container for instance 790d8b14-45b7-49aa-6d29-257f
##################\
We think it is this.
import errors