cloudfoundry-community / firehose-to-syslog

Send firehose events from Cloud Foundry to syslog.
MIT License
44 stars 58 forks source link

ERR panic: runtime error: invalid memory address or nil pointer dereference^ #204

Open filidav opened 6 years ago

filidav commented 6 years ago

2018-09-26T11:07:47.66-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"service-instance_908b922d-8768-4769-b9fd-8e8a383cef11","event_type":"ValueMetric","ip":"","job":"mongod_node","job_index":"fe539b85-6546-4098-ac9f-cd98e4eafcb1","level":"info","msg":"","origin":"bosh-system-metrics-forwarder","system.swap.percent":1,"time":"2018-09-26T15:07:47Z","unit":"Percent"} 2018-09-26T11:07:47.66-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.72","job":"router","job_index":"dcea8fb8-5b51-4510-9873-fd1551c8d912","level":"info","msg":"","origin":"gorouter","route_lookup_time":4900,"time":"2018-09-26T15:07:47Z","unit":"ns"} 2018-09-26T11:07:47.66-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.72","job":"router","job_index":"dcea8fb8-5b51-4510-9873-fd1551c8d912","latency":8,"level":"info","msg":"","origin":"gorouter","time":"2018-09-26T15:07:47Z","unit":"ms"} 2018-09-26T11:07:47.66-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","LeasesRenewRequestTime":47.699400000000004,"cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.34","job":"diego_database","job_index":"6babf9d0-2623-40ae-943a-50fee1eca036","level":"info","msg":"","origin":"silk-controller","time":"2018-09-26T15:07:47Z","unit":"ms"} 2018-09-26T11:07:47.75-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","LeasesIndexRequestTime":10.113199999999999,"cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.34","job":"diego_database","job_index":"6babf9d0-2623-40ae-943a-50fee1eca036","level":"info","msg":"","origin":"silk-controller","time":"2018-09-26T15:07:47Z","unit":"ms"} ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR panic: runtime error: invalid memory address or nil pointer dereference^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x764298]^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR goroutine 27 [running]:^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR git/firehose-to-syslog/caching.(CacheLazyFill).normaliseAndSaveEntityToDatabase(0xc420194480, 0x9b0aa1, 0x4, 0xc4201fc5d0, 0x24, 0x0, 0x0, 0x0)^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR ........../caching/caching_lazyfill.go:145 +0x88^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR git/firehose-to-syslog/caching.(CacheLazyFill).getEntity(0xc420194480, 0x9b0aa1, 0x4, 0xc4201fc390, 0x24, 0xc420231cf0, 0xc420231cf0, 0x4bd94e)^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR .........caching/caching_lazyfill.go:121 +0x2bc^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR git/caching.(CacheLazyFill).GetApp(0xc420194480, 0xc4201fc390, 0x24, 0x1, 0x1, 0xc4201fc390)^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR .........caching/caching_lazyfill.go:242 +0x8a^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR git/events.(Event).AnnotateWithAppData(0xc420414510, 0xa1dca0, 0xc420194480)^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR .........events/events.go:129 +0x12f^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR git./eventRouting.(EventRoutingDefault).RouteEvent(0xc42009b680, 0xc4203242d0)^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR .........eventRouting/eventrouting_default.go:68 +0x14d^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR git/firehoseclient.(FirehoseNozzle).ReadLogsBuffer(0xc4203e28c0, 0xa21960, 0xc4203e44c0)^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR .........firehoseclient/firehoseclient.go:119 +0x1b3^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR created by git/firehoseclient.(*FirehoseNozzle).Start^[[0m ^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR .........firehoseclient/firehoseclient.go:77 +0xb4^[[0m 2018-09-26T11:07:47.80-0400 [APP/PROC/WEB/1] OUT Exit status 2 2018-09-26T11:07:47.80-0400 [CELL/SSHD/1] OUT Exit status 0 2018-09-26T11:07:47.84-0400 [CELL/1] OUT Cell db1be119-8594-481d-92bc-59eb0dc30522 stopping instance 790d8b14-45b7-49aa-6d29-257f 2018-09-26T11:07:47.84-0400 [CELL/1] OUT Cell db1be119-8594-481d-92bc-59eb0dc30522 destroying container for instance 790d8b14-45b7-49aa-6d29-257f 2018-09-26T11:07:48.20-0400 [CELL/1] OUT Cell db1be119-8594-481d-92bc-59eb0dc30522 successfully destroyed container for instance 790d8b14-45b7-49aa-6d29-257f

##################\

We think it is this.

import errors

if nv == nil {
            return errors.New("invalid nv entity")
        }
shinji62 commented 6 years ago

It seems is there https://github.com/cloudfoundry-community/firehose-to-syslog/blob/develop/caching/caching_lazyfill.go#L145 and according to the stacktrace nv is 0x0 (nil)

Not sure why this happen, @aeijdenberg any idea ?

filidav commented 6 years ago

You are referencing a development branch. Was that a mistake? We pulled from master branch.

David Filippelli | Platform Engineer, Platform Reliability Enterprise Technology & Services 250 Bloor Street East | Toronto, Ontario, M4W 1E5 | M. 416-527-3343 David_Filippelli@manulife.commailto:David_Filippelli@manulife.com | manulife.comhttp://www.manulife.com/

Have a question? Want to know how we deliver? Check out our Playbookhttps://mfc.sharepoint.com/sites/GEO/playbook and be a part of the community!

[cid:image001.png@01D37BEB.A96D7410]

From: Etourneau Gwenn notifications@github.com Reply-To: cloudfoundry-community/firehose-to-syslog reply@reply.github.com Date: Wednesday, September 26, 2018 at 9:06 PM To: cloudfoundry-community/firehose-to-syslog firehose-to-syslog@noreply.github.com Cc: David Filippelli David_Filippelli@manulife.com, Author author@noreply.github.com Subject: Re: [cloudfoundry-community/firehose-to-syslog] ERR panic: runtime error: invalid memory address or nil pointer dereference^ (#204)

It seems is there https://github.com/cloudfoundry-community/firehose-to-syslog/blob/develop/caching/caching_lazyfill.go#L145 and according to the stacktrace nv is 0x0 (nil)

Not sure why this happen, @aeijdenberghttps://github.com/aeijdenberg any idea ?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/cloudfoundry-community/firehose-to-syslog/issues/204#issuecomment-424920809, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AWJsOlDZ1suvzD16HUcANcQKdgCJZTk-ks5ufCR5gaJpZM4W7Lpu.

STATEMENT OF CONFIDENTIALITY The information contained in this email message and any attachments may be confidential and legally privileged and is intended for the use of the addressee(s) only. If you are not an intended recipient, please: (1) notify me immediately by replying to this message; (2) do not use, disseminate, distribute or reproduce any part of the message or any attachment; and (3) destroy all copies of this message and any attachments.

shinji62 commented 6 years ago

same no ? https://github.com/cloudfoundry-community/firehose-to-syslog/blob/master/caching/caching_lazyfill.go#L145

filidav commented 6 years ago

This is what we added in red below.

import (

    "encoding/json"

    "fmt"

    "math/rand"

    "strings"

    "time"

    "errors"

func (c CacheLazyFill) normaliseAndSaveEntityToDatabase(entityType, uuid string, nv entity) error {

    // Strip name suffixes if applicable. This is intended for blue green deployments,

    // so that things like -venerable can be stripped from renamed apps

    if entityType == "apps" {

            for _, suffix := range c.config.StripAppSuffixes {

                    if nv == nil {

                            return errors.New("invalid nv entity")

                    }

                    if strings.HasSuffix(nv.Name, suffix) {

                            nv.Name = nv.Name[:len(nv.Name)-len(suffix)]

                            break

                    }

            }

    }

David Filippelli | Platform Engineer, Platform Reliability Enterprise Technology & Services 250 Bloor Street East | Toronto, Ontario, M4W 1E5 | M. 416-527-3343 David_Filippelli@manulife.commailto:David_Filippelli@manulife.com | manulife.comhttp://www.manulife.com/

Have a question? Want to know how we deliver? Check out our Playbookhttps://mfc.sharepoint.com/sites/GEO/playbook and be a part of the community!

[cid:image001.png@01D37BEB.A96D7410]

From: Etourneau Gwenn notifications@github.com Reply-To: cloudfoundry-community/firehose-to-syslog reply@reply.github.com Date: Wednesday, September 26, 2018 at 9:30 PM To: cloudfoundry-community/firehose-to-syslog firehose-to-syslog@noreply.github.com Cc: David Filippelli David_Filippelli@manulife.com, Author author@noreply.github.com Subject: Re: [cloudfoundry-community/firehose-to-syslog] ERR panic: runtime error: invalid memory address or nil pointer dereference^ (#204)

same no ? https://github.com/cloudfoundry-community/firehose-to-syslog/blob/master/caching/caching_lazyfill.go#L145

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/cloudfoundry-community/firehose-to-syslog/issues/204#issuecomment-424924610, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AWJsOlKB8vgYEelAo7iCLGvSp0SXW6IQks5ufCoogaJpZM4W7Lpu.

STATEMENT OF CONFIDENTIALITY The information contained in this email message and any attachments may be confidential and legally privileged and is intended for the use of the addressee(s) only. If you are not an intended recipient, please: (1) notify me immediately by replying to this message; (2) do not use, disseminate, distribute or reproduce any part of the message or any attachment; and (3) destroy all copies of this message and any attachments.

shinji62 commented 6 years ago

I will propose a fix soon. But the problem is I don't know how you get this issue.

So I can reproduce in my env.

Thanks.

filidav commented 6 years ago

We are running PCF 2.1

The following was done

Setup repo

go get github.com/cloudfoundry-community/firehose-to-syslog cd $GOPATH/src/github.com/cloudfoundry-community/firehose-to-syslog

Modified

func ValueMetric(msg *events.Envelope) *Event {

        valMetric := msg.GetValueMetric()

        fields := logrus.Fields{

                "unit":              valMetric.GetUnit(),

                valMetric.GetName(): valMetric.GetValue(),

        }

        return &Event{

                Fields: fields,

                Msg:    "",

        }

}

Build binary

go build

Pushed the app with the new firehose-to-syslog binary. Wait a bit and you get that memory error.

As a side note are there any thoughts of adding filtering with the ORG: org1, org2 option? For example ABC-123, ABC-231 would be ABC* instead of writing both orgs in the filter list.

David Filippelli | Platform Engineer, Platform Reliability Enterprise Technology & Services 250 Bloor Street East | Toronto, Ontario, M4W 1E5 | M. 416-527-3343 David_Filippelli@manulife.commailto:David_Filippelli@manulife.com | manulife.comhttp://www.manulife.com/

Have a question? Want to know how we deliver? Check out our Playbookhttps://mfc.sharepoint.com/sites/GEO/playbook and be a part of the community!

[cid:image001.png@01D37BEB.A96D7410]

From: Etourneau Gwenn notifications@github.com Reply-To: cloudfoundry-community/firehose-to-syslog reply@reply.github.com Date: Wednesday, September 26, 2018 at 9:51 PM To: cloudfoundry-community/firehose-to-syslog firehose-to-syslog@noreply.github.com Cc: David Filippelli David_Filippelli@manulife.com, Author author@noreply.github.com Subject: Re: [cloudfoundry-community/firehose-to-syslog] ERR panic: runtime error: invalid memory address or nil pointer dereference^ (#204)

I will propose a fix soon. But the problem is I don't know how you get this issue.

So I can reproduce in my env.

Thanks.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/cloudfoundry-community/firehose-to-syslog/issues/204#issuecomment-424929810, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AWJsOjNC2Z-vzKjH0NdwJpww_P7Bl8m7ks5ufC8KgaJpZM4W7Lpu.

STATEMENT OF CONFIDENTIALITY The information contained in this email message and any attachments may be confidential and legally privileged and is intended for the use of the addressee(s) only. If you are not an intended recipient, please: (1) notify me immediately by replying to this message; (2) do not use, disseminate, distribute or reproduce any part of the message or any attachment; and (3) destroy all copies of this message and any attachments.

shinji62 commented 6 years ago

Can you checkout the branch fix/nil_error and try in your env. Thanks

aeijdenberg commented 6 years ago

@shinji62 - that might fix the symptom, but I don't think helps with the cause. e.g. I can't understand why https://github.com/cloudfoundry-community/firehose-to-syslog/blob/develop/caching/caching_lazyfill.go#L180 would ever see a nil r.Entity. A caller shouldn't need to check for a nil, it simply shouldn't be returning one.

ie why would the API ever return a GUID for an app, but no entity record? I'd to see the full response that came back from the server....

shinji62 commented 6 years ago

@aeijdenberg I don't know neither ... I can't reproduce.. I just provided a quick / dirty fix.

aeijdenberg commented 6 years ago

It's not a fix though - it just masks what could be a lurking and legitimate problem on either the server, or our understanding of how the API call is meant to work.

Maybe could try something like this first: https://github.com/cloudfoundry-community/firehose-to-syslog/pull/205

shinji62 commented 6 years ago

yeah quick and dirty :) booohhh

elliottpolk commented 6 years ago

Hi, I'm a colleague of @filidav who's been advising on this issue. I recommend a couple of things for this:

Do the nil check in both places. For FilleCache() this fixes that specific nil issue. Including if nz == nil {...} in the normaliseAndSaveEntityToDatabase(...) will future proof this for any other callers trying to do this.

I'd also recommend updating the fix/nil_error patch to look like:

// Populate bolt with all apps
for guid, app := range allApps {
    // escape early to avoid unnecessary allocations
    if app == nil {
        continue
    }

    // Canonicalise the guid
    u, err := uuid.FromString(guid)
    if err != nil {
        return err
    }

    uuid := u.String()

    // ...
}
aeijdenberg commented 6 years ago

It is counter-productive to check for nil and then continue. Please don't do that.

Why is that a bad idea? Because it's masking a different problem - these should never be nil to begin with - that they are indicates a bug or API misunderstanding elsewhere.

I think it's reasonable to check and raise an error that can help diagnose the issue, but to check and continue as though there's no issue is worse than crashing.

I would be interested to see what log messages result if you merge https://github.com/cloudfoundry-community/firehose-to-syslog/pull/205/. That should print the URL which you could curl to look at the response and see if we can figure out why key elements of the JSON aren't being populated that we expect to be.

filidav commented 6 years ago

If I do test this will ever be merged into master?

aeijdenberg commented 6 years ago

I think if you test it, then you'd be helping us figure out what the actual bug is and where it might lie.

filidav commented 6 years ago

Sounds good I can assume there have no changes to this branch since yesterday?

elliottpolk commented 6 years ago

It is counter-productive to check for nil and then continue. Please don't do that.

Why is that a bad idea? Because it's masking a different problem - these should never be nil to begin with - that they are indicates a bug or API misunderstanding elsewhere.

I think it's reasonable to check and raise an error that can help diagnose the issue, but to check and continue as though there's no issue is worse than crashing.

I would be interested to see what log messages result if you merge #205. That should print the URL which you could curl to look at the response and see if we can figure out why key elements of the JSON aren't being populated that we expect to be.

I agree that "hiding" this issue with a continue is bad practice depending on the situation. In this case, it is probably not good to continue with only part of the data.

filidav commented 6 years ago

Deployed the firehose nozzle app with this branch Error is back

2018-09-28T08:03:13.561-04:00 [APP/PROC/WEB/1] [OUT] [2018-09-28 12:03:13.561663524 +0000 UTC m=+0.050649701] Using wss://doppler.sys.use.sandbox.pcf.manulife.com:443 as doppler endpoint 2018-09-28T08:03:13.561-04:00 [APP/PROC/WEB/1] [OUT] [2018-09-28 12:03:13.561752924 +0000 UTC m=+0.050739801] Pre-filling cache... 2018-09-28T08:03:18.011-04:00 [APP/PROC/WEB/0] [OUT] [2018-09-28 12:03:18.011354613 +0000 UTC m=+4.617422301] Cache filled. 2018-09-28T08:03:18.115-04:00 [APP/PROC/WEB/0] [OUT] [2018-09-28 12:03:18.115699613 +0000 UTC m=+4.721767701] Connected to Syslog Server! Connecting to Firehose... 2018-09-28T08:03:18.175-04:00 [APP/PROC/WEB/1] [OUT] [2018-09-28 12:03:18.175488127 +0000 UTC m=+4.664474004] Cache filled. 2018-09-28T08:03:18.256-04:00 [APP/PROC/WEB/1] [OUT] [2018-09-28 12:03:18.256154427 +0000 UTC m=+4.745140904] Connected to Syslog Server! Connecting to Firehose... 2018-09-28T09:21:33.918-04:00 [APP/PROC/WEB/0] [ERR] panic: runtime error: invalid memory address or nil pointer dereference 2018-09-28T09:21:33.918-04:00 [APP/PROC/WEB/0] [ERR] [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x78dae4] 2018-09-28T09:21:33.918-04:00 [APP/PROC/WEB/0] [ERR] goroutine 66 [running]: 2018-09-28T09:21:33.918-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).normaliseAndSaveEntityToDatabase(0xc0001751a0, 0x9e481d, 0x4, 0xc00022e420, 0x24, 0x0, 0x0, 0x0) 2018-09-28T09:21:33.918-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:148 +0x84 2018-09-28T09:21:33.918-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).getEntity(0xc0001751a0, 0x9e481d, 0x4, 0xc00022e390, 0x24, 0xc0002efcf0, 0xc0002efcf0, 0x4c3540) 2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:124 +0x2b8 2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).GetApp(0xc0001751a0, 0xc00022e390, 0x24, 0x1, 0x1, 0xc00022e390) 2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:245 +0x8a 2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/events.(Event).AnnotateWithAppData(0xc00022ba10, 0xa82e80, 0xc0001751a0) 2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/events/events.go:129 +0x129 2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/eventRouting.(EventRoutingDefault).RouteEvent(0xc000236ff0, 0xc0001d0b40) 2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/eventRouting/eventrouting_default.go:68 +0x34d 2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient.(FirehoseNozzle).ReadLogsBuffer(0xc000400150, 0xa86be0, 0xc0004e0300) 2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient/firehoseclient.go:119 +0x189 2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] created by git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient.(*FirehoseNozzle).Start 2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient/firehoseclient.go:77 +0xb4 2018-09-28T09:21:33.930-04:00 [APP/PROC/WEB/0] [OUT] Exit status 2 2018-09-28T09:21:33.933-04:00 [CELL/SSHD/0] [OUT] Exit status 0 2018-09-28T09:21:33.951-04:00 [API/1] [OUT] Process has crashed with type: "web" 2018-09-28T09:21:33.965-04:00 [API/1] [OUT] App instance exited with guid 3b8e29d9-b161-4c14-b9fc-21db53e63e25 payload: {"instance"=>"66ce6fbf-705b-44ff-43d3-54c2", "index"=>0, "reason"=>"CRASHED", "exit_description"=>"APP/PROC/WEB: Exited with status 2", "crash_count"=>1, "crash_timestamp"=>1538140893928354200, "version"=>"dd24d491-efea-442c-9563-4b5f48567d03"} 2018-09-28T09:21:33.972-04:00 [CELL/0] [OUT] Cell db1be119-8594-481d-92bc-59eb0dc30522 stopping instance 66ce6fbf-705b-44ff-43d3-54c2 2018-09-28T09:21:33.972-04:00 [CELL/0] [OUT] Cell db1be119-8594-481d-92bc-59eb0dc30522 destroying container for instance 66ce6fbf-705b-44ff-43d3-54c2 2018-09-28T09:21:34.005-04:00 [CELL/0] [OUT] Cell 8e12ecd0-fb5c-4cde-acda-a3df0326c641 creating container for instance 36edab2d-ac3b-4660-7dd4-3d26 2018-09-28T09:21:34.231-04:00 [CELL/0] [OUT] Cell db1be119-8594-481d-92bc-59eb0dc30522 successfully destroyed container for instance 66ce6fbf-705b-44ff-43d3-54c2 2018-09-28T09:21:34.384-04:00 [CELL/0] [OUT] Cell 8e12ecd0-fb5c-4cde-acda-a3df0326c641 successfully created container for instance 36edab2d-ac3b-4660-7dd4-3d26 2018-09-28T09:21:35.353-04:00 [APP/PROC/WEB/0] [OUT] [2018-09-28 13:21:35.351654476 +0000 UTC m=+0.005510501] Starting firehose-to-syslog 0.0.0 2018-09-28T09:21:35.381-04:00 [APP/PROC/WEB/0] [OUT] wss://doppler.sys.use.sandbox.pcf.manulife.com:443

filidav commented 6 years ago

What else do you require?

filidav commented 6 years ago

changed manifest to debug : true

Not much more detail

2018-09-28T12:06:33.72-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"p-mysql-8f4fb8de90f6a3757e58","event_type":"ValueMetric","ip":"","job":"monitoring","job_index":"c92ca7a8-a416-445d-9af2-a51f72ea816b","level":"info","msg":"","origin":"bosh-system-metrics-forwarder","time":"2018-09-28T16:06:33Z","unit":"Percent","valMetric.GetName":0} 2018-09-28T12:06:33.72-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"p-mysql-8f4fb8de90f6a3757e58","event_type":"ValueMetric","ip":"","job":"monitoring","job_index":"c92ca7a8-a416-445d-9af2-a51f72ea816b","level":"info","msg":"","origin":"bosh-system-metrics-forwarder","time":"2018-09-28T16:06:33Z","unit":"Load","valMetric.GetName":0.3} 2018-09-28T12:06:33.74-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.89","job":"cloud_controller","job_index":"16bad1d6-4d25-40b8-8116-32809fb17d70","level":"info","msg":"","origin":"cc","time":"2018-09-28T16:06:33Z","unit":"counter","valMetric.GetName":-1} 2018-09-28T12:06:33.74-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.89","job":"cloud_controller","job_index":"16bad1d6-4d25-40b8-8116-32809fb17d70","level":"info","msg":"","origin":"cc","time":"2018-09-28T16:06:33Z","unit":"counter","valMetric.GetName":1263625} 2018-09-28T12:06:33.74-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.89","job":"cloud_controller","job_index":"16bad1d6-4d25-40b8-8116-32809fb17d70","level":"info","msg":"","origin":"cc","time":"2018-09-28T16:06:33Z","unit":"counter","valMetric.GetName":-1} 2018-09-28T12:06:33.74-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.30","job":"router","job_index":"f5c4821a-a54c-47d1-b548-ec33d47d3491","level":"info","msg":"","origin":"gorouter","time":"2018-09-28T16:06:33Z","unit":"ns","valMetric.GetName":3100} 2018-09-28T12:06:33.74-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.74","job":"router","job_index":"c2d0977d-7eb6-46ca-a37a-ba01b9f150bf","level":"info","msg":"","origin":"gorouter","time":"2018-09-28T16:06:33Z","unit":"seconds","valMetric.GetName":129496} 2018-09-28T12:06:33.74-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.89","job":"cloud_controller","job_index":"16bad1d6-4d25-40b8-8116-32809fb17d70","level":"info","msg":"","origin":"cc","time":"2018-09-28T16:06:33Z","unit":"counter","valMetric.GetName":0} 2018-09-28T12:06:33.74-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.89","job":"cloud_controller","job_index":"16bad1d6-4d25-40b8-8116-32809fb17d70","level":"info","msg":"","origin":"cc","time":"2018-09-28T16:06:33Z","unit":"counter","valMetric.GetName":1263626} 2018-09-28T12:06:33.74-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.89","job":"cloud_controller","job_index":"16bad1d6-4d25-40b8-8116-32809fb17d70","level":"info","msg":"","origin":"cc","time":"2018-09-28T16:06:33Z","unit":"counter","valMetric.GetName":1261013} 2018-09-28T12:06:33.74-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.72","job":"router","job_index":"dcea8fb8-5b51-4510-9873-fd1551c8d912","level":"info","msg":"","origin":"gorouter","time":"2018-09-28T16:06:33Z","unit":"ns","valMetric.GetName":3400} ^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR panic: runtime error: invalid memory address or nil pointer dereference^[[0m ^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x78dae4]^[[0m ^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR goroutine 67 [running]:^[[0m ^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).normaliseAndSaveEntityToDatabase(0xc00010e600, 0x9e481d, 0x4, 0xc0004e6120, 0x24, 0x0, 0x0, 0x0)^[[0m ^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:148 +0x84^[[0m ^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).getEntity(0xc00010e600, 0x9e481d, 0x4, 0xc0004e60c0, 0x24, 0xc00030bcf0, 0xc00030bcf0, 0x4c3540)^[[0m ^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:124 +0x2b8^[[0m ^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).GetApp(0xc00010e600, 0xc0004e60c0, 0x24, 0x1, 0x1, 0xc0004e60c0)^[[0m ^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:245 +0x8a^[[0m ^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR git.platform.manulife.io/platformreliability/firehose-to-syslog/events.(Event).AnnotateWithAppData(0xc000242c90, 0xa82e80, 0xc00010e600)^[[0m ^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/events/events.go:129 +0x129^[[0m 2018-09-28T12:06:33.82-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.29","job":"cloud_controller","job_index":"99244cb7-e921-44e7-b6be-e67b9cc0782b","level":"info","msg":"","origin":"cc","time":"2018-09-28T16:06:33Z","unit":"counter","valMetric.GetName":0} 2018-09-28T12:06:33.82-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.73","job":"router","job_index":"79a95419-2bf9-4f69-be41-da1da70ffcc3","level":"info","msg":"","origin":"gorouter","time":"2018-09-28T16:06:33Z","unit":"ms","valMetric.GetName":23} 2018-09-28T12:06:33.82-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.73","job":"router","job_index":"79a95419-2bf9-4f69-be41-da1da70ffcc3","level":"info","msg":"","origin":"gorouter","time":"2018-09-28T16:06:33Z","unit":"ms","valMetric.GetName":39} 2018-09-28T12:06:33.89-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.74","job":"router","job_index":"c2d0977d-7eb6-46ca-a37a-ba01b9f150bf","level":"info","msg":"","origin":"gorouter","time":"2018-09-28T16:06:33Z","unit":"ms","valMetric.GetName":19} 2018-09-28T12:06:33.89-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.73","job":"router","job_index":"79a95419-2bf9-4f69-be41-da1da70ffcc3","level":"info","msg":"","origin":"gorouter","time":"2018-09-28T16:06:33Z","unit":"ns","valMetric.GetName":4300} 2018-09-28T12:06:33.89-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.54","job":"diego_cell","job_index":"f9db5322-ba4d-407f-8c66-b99f143a3b19","level":"info","msg":"","origin":"netmon","time":"2018-09-28T16:06:33Z","unit":"count","valMetric.GetName":1368552} 2018-09-28T12:06:33.89-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.54","job":"diego_cell","job_index":"f9db5322-ba4d-407f-8c66-b99f143a3b19","level":"info","msg":"","origin":"netmon","time":"2018-09-28T16:06:33Z","unit":"count","valMetric.GetName":4} 2018-09-28T12:06:33.89-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.54","job":"diego_cell","job_index":"f9db5322-ba4d-407f-8c66-b99f143a3b19","level":"info","msg":"","origin":"netmon","time":"2018-09-28T16:06:33Z","unit":"count","valMetric.GetName":3601817} 2018-09-28T12:06:33.89-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.73","job":"router","job_index":"79a95419-2bf9-4f69-be41-da1da70ffcc3","level":"info","msg":"","origin":"gorouter","time":"2018-09-28T16:06:33Z","unit":"ms","valMetric.GetName":23} 2018-09-28T12:06:33.89-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.73","job":"router","job_index":"79a95419-2bf9-4f69-be41-da1da70ffcc3","level":"info","msg":"","origin":"gorouter","time":"2018-09-28T16:06:33Z","unit":"ms","valMetric.GetName":39} 2018-09-28T12:06:33.89-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.30","job":"router","job_index":"f5c4821a-a54c-47d1-b548-ec33d47d3491","level":"info","msg":"","origin":"gorouter","time":"2018-09-28T16:06:33Z","unit":"ms","valMetric.GetName":147}

filidav commented 6 years ago

I thought with this branch you put in a fix that would avoid this issue. Seem like the issue is still present.

filidav commented 6 years ago

I need to merge #205 and then test again.

shinji62 commented 6 years ago

@filidav Please let me know the status of your test.

You will need to merge #205 to get some error message. From that we will be able to understand a bit better what is going on.

filidav commented 6 years ago

I merged pull #205

018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] panic: runtime error: invalid memory address or nil pointer dereference 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x78dae4] 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] goroutine 28 [running]: 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).normaliseAndSaveEntityToDatabase(0xc000150300, 0x9e481d, 0x4, 0xc0000c6090, 0x24, 0x0, 0x0, 0x0) 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:145 +0x84 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).getEntity(0xc000150300, 0x9e481d, 0x4, 0xc0000c6030, 0x24, 0xc000605d28, 0xc000605d28, 0x4c3540) 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:121 +0x2b8 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).GetApp(0xc000150300, 0xc0000c6030, 0x24, 0x1, 0x1, 0xc0000c6030) 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:245 +0x8a 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/events.(Event).AnnotateWithAppData(0xc0002722a0, 0xa82ec0, 0xc000150300) 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/events/events.go:129 +0x129 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/eventRouting.(EventRoutingDefault).RouteEvent(0xc000294eb0, 0xc0001127e0) 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/eventRouting/eventrouting_default.go:68 +0x14e 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient.(FirehoseNozzle).ReadLogsBuffer(0xc00023c0e0, 0xa86c20, 0xc0002f61c0) 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient/firehoseclient.go:119 +0x189 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] created by git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient.(*FirehoseNozzle).Start 2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient/firehoseclient.go:77 +0xb4 2018-10-01T11:46:34.217-04:00 [APP/PROC/WEB/1] [OUT] Exit status 2

filidav commented 6 years ago

Does it help ^^ ??

shinji62 commented 6 years ago

Not really you don't have some logs before this one ?

filidav commented 6 years ago

Started the app back up to get you the logs before. Do you want me to put the app in debug mode?

Here is the manifest for you reference.

applications:

David Filippelli | Platform Engineer, Platform Reliability Enterprise Technology & Services 250 Bloor Street East | Toronto, Ontario, M4W 1E5 | M. 416-527-3343 David_Filippelli@manulife.commailto:David_Filippelli@manulife.com | manulife.comhttp://www.manulife.com/

Have a question? Want to know how we deliver? Check out our Playbookhttps://mfc.sharepoint.com/sites/GEO/playbook and be a part of the community!

[cid:image001.png@01D37BEB.A96D7410]

From: Etourneau Gwenn notifications@github.com Reply-To: cloudfoundry-community/firehose-to-syslog reply@reply.github.com Date: Monday, October 1, 2018 at 7:50 PM To: cloudfoundry-community/firehose-to-syslog firehose-to-syslog@noreply.github.com Cc: David Filippelli David_Filippelli@manulife.com, Mention mention@noreply.github.com Subject: Re: [cloudfoundry-community/firehose-to-syslog] ERR panic: runtime error: invalid memory address or nil pointer dereference^ (#204)

Not really you don't have some logs before this one ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cloudfoundry-community/firehose-to-syslog/issues/204#issuecomment-426100670, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AWJsOtp7VmNpz6i47WnKjmrStxxXmwKPks5ugqongaJpZM4W7Lpu.

STATEMENT OF CONFIDENTIALITY The information contained in this email message and any attachments may be confidential and legally privileged and is intended for the use of the addressee(s) only. If you are not an intended recipient, please: (1) notify me immediately by replying to this message; (2) do not use, disseminate, distribute or reproduce any part of the message or any attachment; and (3) destroy all copies of this message and any attachments.

filidav commented 6 years ago

Nothing additional you can use.

Just the connection before and the destroy/recreate of the container

2018-10-01T20:43:27.221-04:00 [API/1] [OUT] Starting app with guid 4cc5493a-8559-4d28-a489-cf6d972d348f 2018-10-01T20:43:27.393-04:00 [CELL/0] [OUT] Cell 04c9251e-4555-4f96-8553-d66d852316ff creating container for instance 737c769b-7519-41dd-531e-1111 2018-10-01T20:43:27.410-04:00 [CELL/1] [OUT] Cell db1be119-8594-481d-92bc-59eb0dc30522 creating container for instance 1b74d5f1-a4ed-4d50-452c-2d4c 2018-10-01T20:43:27.722-04:00 [CELL/0] [OUT] Cell 04c9251e-4555-4f96-8553-d66d852316ff successfully created container for instance 737c769b-7519-41dd-531e-1111 2018-10-01T20:43:28.094-04:00 [CELL/1] [OUT] Cell db1be119-8594-481d-92bc-59eb0dc30522 successfully created container for instance 1b74d5f1-a4ed-4d50-452c-2d4c 2018-10-01T20:43:28.517-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-02 00:43:28.516795289 +0000 UTC m=+0.006308301] Starting firehose-to-syslog 0.0.0 2018-10-01T20:43:28.544-04:00 [APP/PROC/WEB/0] [OUT] wss://doppler.sys.use.sandbox.pcf.manulife.com:443 2018-10-01T20:43:28.544-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-02 00:43:28.544562889 +0000 UTC m=+0.034074801] Using wss://doppler.sys.use.sandbox.pcf.manulife.com:443 as doppler endpoint 2018-10-01T20:43:28.544-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-02 00:43:28.544708989 +0000 UTC m=+0.034221101] Pre-filling cache... 2018-10-01T20:43:29.190-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 00:43:29.187993954 +0000 UTC m=+0.008754101] Starting firehose-to-syslog 0.0.0 2018-10-01T20:43:29.244-04:00 [APP/PROC/WEB/1] [OUT] wss://doppler.sys.use.sandbox.pcf.manulife.com:443 2018-10-01T20:43:29.244-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 00:43:29.243482054 +0000 UTC m=+0.064241201] Using wss://doppler.sys.use.sandbox.pcf.manulife.com:443 as doppler endpoint 2018-10-01T20:43:29.244-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 00:43:29.243571554 +0000 UTC m=+0.064330801] Pre-filling cache... 2018-10-01T20:43:32.903-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-02 00:43:32.903195291 +0000 UTC m=+4.392707003] Cache filled. 2018-10-01T20:43:32.984-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-02 00:43:32.984479391 +0000 UTC m=+4.473991503] Connected to Syslog Server! Connecting to Firehose... 2018-10-01T20:43:33.576-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 00:43:33.575924054 +0000 UTC m=+4.396682601] Cache filled. 2018-10-01T20:43:33.665-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 00:43:33.665286754 +0000 UTC m=+4.486046001] Connected to Syslog Server! Connecting to Firehose... 2018-10-01T21:56:40.135-04:00 [APP/PROC/WEB/1] [ERR] panic: runtime error: invalid memory address or nil pointer dereference 2018-10-01T21:56:40.135-04:00 [APP/PROC/WEB/1] [ERR] [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x78dae4] 2018-10-01T21:56:40.136-04:00 [APP/PROC/WEB/1] [ERR] goroutine 41 [running]: 2018-10-01T21:56:40.137-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).normaliseAndSaveEntityToDatabase(0xc000101b30, 0x9e481d, 0x4, 0xc000254390, 0x24, 0x0, 0x0, 0x0) 2018-10-01T21:56:40.137-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:145 +0x84 2018-10-01T21:56:40.138-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).getEntity(0xc000101b30, 0x9e481d, 0x4, 0xc0002541e0, 0x24, 0xc00034dd28, 0xc00034dd28, 0x4c3540) 2018-10-01T21:56:40.138-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:121 +0x2b8 2018-10-01T21:56:40.139-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).GetApp(0xc000101b30, 0xc0002541e0, 0x24, 0x1, 0x1, 0xc0002541e0) 2018-10-01T21:56:40.139-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:245 +0x8a 2018-10-01T21:56:40.140-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/events.(Event).AnnotateWithAppData(0xc0001439b0, 0xa82ec0, 0xc000101b30) 2018-10-01T21:56:40.140-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/events/events.go:129 +0x129 2018-10-01T21:56:40.140-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/eventRouting.(EventRoutingDefault).RouteEvent(0xc00018c140, 0xc00029f320) 2018-10-01T21:56:40.141-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/eventRouting/eventrouting_default.go:68 +0x14e 2018-10-01T21:56:40.141-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient.(FirehoseNozzle).ReadLogsBuffer(0xc0000982a0, 0xa86c20, 0xc00020a0c0) 2018-10-01T21:56:40.141-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient/firehoseclient.go:119 +0x189 2018-10-01T21:56:40.141-04:00 [APP/PROC/WEB/1] [ERR] created by git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient.(*FirehoseNozzle).Start 2018-10-01T21:56:40.142-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient/firehoseclient.go:77 +0xb4 2018-10-01T21:56:40.165-04:00 [APP/PROC/WEB/1] [OUT] Exit status 2 2018-10-01T21:56:40.167-04:00 [CELL/SSHD/1] [OUT] Exit status 0 2018-10-01T21:56:40.201-04:00 [API/0] [OUT] Process has crashed with type: "web" 2018-10-01T21:56:40.205-04:00 [CELL/1] [OUT] Cell db1be119-8594-481d-92bc-59eb0dc30522 stopping instance 1b74d5f1-a4ed-4d50-452c-2d4c 2018-10-01T21:56:40.205-04:00 [CELL/1] [OUT] Cell db1be119-8594-481d-92bc-59eb0dc30522 destroying container for instance 1b74d5f1-a4ed-4d50-452c-2d4c 2018-10-01T21:56:40.215-04:00 [API/0] [OUT] App instance exited with guid 4cc5493a-8559-4d28-a489-cf6d972d348f payload: {"instance"=>"1b74d5f1-a4ed-4d50-452c-2d4c", "index"=>1, "reason"=>"CRASHED", "exit_description"=>"APP/PROC/WEB: Exited with status 2", "crash_count"=>1, "crash_timestamp"=>1538445400183506084, "version"=>"32f2bc3c-2fc2-4b7f-a5d4-76e9de73d352"} 2018-10-01T21:56:40.248-04:00 [CELL/1] [OUT] Cell 8e12ecd0-fb5c-4cde-acda-a3df0326c641 creating container for instance 885b4b0c-2fcd-40f0-533f-bba0 2018-10-01T21:56:40.554-04:00 [CELL/1] [OUT] Cell db1be119-8594-481d-92bc-59eb0dc30522 successfully destroyed container for instance 1b74d5f1-a4ed-4d50-452c-2d4c 2018-10-01T21:56:40.609-04:00 [CELL/1] [OUT] Cell 8e12ecd0-fb5c-4cde-acda-a3df0326c641 successfully created container for instance 885b4b0c-2fcd-40f0-533f-bba0 2018-10-01T21:56:41.568-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 01:56:41.568107473 +0000 UTC m=+0.003780401] Starting firehose-to-syslog 0.0.0 2018-10-01T21:56:41.597-04:00 [APP/PROC/WEB/1] [OUT] wss://doppler.sys.use.sandbox.pcf.manulife.com:443 2018-10-01T21:56:41.597-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 01:56:41.596951573 +0000 UTC m=+0.032622901] Using wss://doppler.sys.use.sandbox.pcf.manulife.com:443 as doppler endpoint 2018-10-01T21:56:41.597-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 01:56:41.597034973 +0000 UTC m=+0.032707001] Pre-filling cache... 2018-10-01T21:56:45.952-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 01:56:45.952673773 +0000 UTC m=+4.388345001] Cache filled. 2018-10-01T21:56:46.033-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 01:56:46.033287973 +0000 UTC m=+4.468959701] Connected to Syslog Server! Connecting to Firehose...

filidav commented 6 years ago

I thought I would be able to see a URL that I could curl that is emitting the nil pointer. I don't see any URL

filidav commented 6 years ago

failed again

2018-10-01T21:56:40.248-04:00 [CELL/1] [OUT] Cell 8e12ecd0-fb5c-4cde-acda-a3df0326c641 creating container for instance 885b4b0c-2fcd-40f0-533f-bba0 2018-10-01T21:56:40.554-04:00 [CELL/1] [OUT] Cell db1be119-8594-481d-92bc-59eb0dc30522 successfully destroyed container for instance 1b74d5f1-a4ed-4d50-452c-2d4c 2018-10-01T21:56:40.609-04:00 [CELL/1] [OUT] Cell 8e12ecd0-fb5c-4cde-acda-a3df0326c641 successfully created container for instance 885b4b0c-2fcd-40f0-533f-bba0 2018-10-01T21:56:41.568-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 01:56:41.568107473 +0000 UTC m=+0.003780401] Starting firehose-to-syslog 0.0.0 2018-10-01T21:56:41.597-04:00 [APP/PROC/WEB/1] [OUT] wss://doppler.sys.use.sandbox.pcf.manulife.com:443 2018-10-01T21:56:41.597-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 01:56:41.596951573 +0000 UTC m=+0.032622901] Using wss://doppler.sys.use.sandbox.pcf.manulife.com:443 as doppler endpoint 2018-10-01T21:56:41.597-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 01:56:41.597034973 +0000 UTC m=+0.032707001] Pre-filling cache... 2018-10-01T21:56:45.952-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 01:56:45.952673773 +0000 UTC m=+4.388345001] Cache filled. 2018-10-01T21:56:46.033-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 01:56:46.033287973 +0000 UTC m=+4.468959701] Connected to Syslog Server! Connecting to Firehose... 2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] panic: runtime error: invalid memory address or nil pointer dereference 2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x78dae4] 2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] goroutine 40 [running]: 2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).normaliseAndSaveEntityToDatabase(0xc000128480, 0x9e481d, 0x4, 0xc000267500, 0x24, 0x0, 0x0, 0x0) 2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:145 +0x84 2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).getEntity(0xc000128480, 0x9e481d, 0x4, 0xc0002674a0, 0x24, 0xc000233d28, 0xc000233d28, 0x4c3540) 2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:121 +0x2b8 2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).GetApp(0xc000128480, 0xc0002674a0, 0x24, 0x1, 0x1, 0xc0002674a0) 2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:245 +0x8a 2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/events.(Event).AnnotateWithAppData(0xc00047be00, 0xa82ec0, 0xc000128480) 2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/events/events.go:129 +0x129 2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/eventRouting.(EventRoutingDefault).RouteEvent(0xc00008ac80, 0xc0000f0c60) 2018-10-01T22:16:33.819-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/eventRouting/eventrouting_default.go:68 +0x14e 2018-10-01T22:16:33.819-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient.(FirehoseNozzle).ReadLogsBuffer(0xc0001f3b90, 0xa86c20, 0xc00014db00) 2018-10-01T22:16:33.819-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient/firehoseclient.go:119 +0x189 2018-10-01T22:16:33.819-04:00 [APP/PROC/WEB/0] [ERR] created by git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient.(*FirehoseNozzle).Start 2018-10-01T22:16:33.819-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient/firehoseclient.go:77 +0xb4 2018-10-01T22:16:33.831-04:00 [APP/PROC/WEB/0] [OUT] Exit status 2 2018-10-01T22:16:33.834-04:00 [CELL/SSHD/0] [OUT] Exit status 0 2018-10-01T22:16:33.855-04:00 [API/1] [OUT] Process has crashed with type: "web" 2018-10-01T22:16:33.871-04:00 [API/1] [OUT] App instance exited with guid 4cc5493a-8559-4d28-a489-cf6d972d348f payload: {"instance"=>"737c769b-7519-41dd-531e-1111", "index"=>0, "reason"=>"CRASHED", "exit_description"=>"APP/PROC/WEB: Exited with status 2", "crash_count"=>1, "crash_timestamp"=>1538446593828050115, "version"=>"32f2bc3c-2fc2-4b7f-a5d4-76e9de73d352"} 2018-10-01T22:16:33.872-04:00 [CELL/0] [OUT] Cell 04c9251e-4555-4f96-8553-d66d852316ff stopping instance 737c769b-7519-41dd-531e-1111 2018-10-01T22:16:33.872-04:00 [CELL/0] [OUT] Cell 04c9251e-4555-4f96-8553-d66d852316ff destroying container for instance 737c769b-7519-41dd-531e-1111 2018-10-01T22:16:33.900-04:00 [CELL/0] [OUT] Cell f8c6141c-d989-4040-b27c-e891461b69e4 creating container for instance da5b65f7-9dc5-4900-51fd-841c 2018-10-01T22:16:34.219-04:00 [CELL/0] [OUT] Cell 04c9251e-4555-4f96-8553-d66d852316ff successfully destroyed container for instance 737c769b-7519-41dd-531e-1111 2018-10-01T22:16:34.369-04:00 [CELL/0] [OUT] Cell f8c6141c-d989-4040-b27c-e891461b69e4 successfully created container for instance da5b65f7-9dc5-4900-51fd-841c 2018-10-01T22:16:35.489-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-02 02:16:35.488065621 +0000 UTC m=+0.006920801] Starting firehose-to-syslog 0.0.0 2018-10-01T22:16:35.534-04:00 [APP/PROC/WEB/0] [OUT] wss://doppler.sys.use.sandbox.pcf.manulife.com:443 2018-10-01T22:16:35.534-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-02 02:16:35.534180321 +0000 UTC m=+0.053035001] Using wss://doppler.sys.use.sandbox.pcf.manulife.com:443 as doppler endpoint 2018-10-01T22:16:35.534-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-02 02:16:35.534245521 +0000 UTC m=+0.053100301] Pre-filling cache... 2018-10-01T22:16:39.800-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-02 02:16:39.800264524 +0000 UTC m=+4.319119004] Cache filled.

shinji62 commented 6 years ago

Can you try with debug mode enabled ?

filidav commented 6 years ago

With debug

"job":"cloud_controller","job_index":"16bad1d6-4d25-40b8-8116-32809fb17d70","level":"info","msg":"","origin":"cc","requests.outstanding":3,"time":"2018-10-02T03:26:40Z","unit":"counter"} 2018-10-01T23:26:40.14-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf-47abe4dc6b20d063f500","event_type":"ValueMetric","health.check.cliCommand.start":1,"ip":"","job":"healthwatch-forwarder","job_index":"09bc9061-de37-46f1-bdb9-7409a2e9c6b1","level":"info","msg":"","origin":"healthwatch","time":"2018-10-02T03:26:40Z","unit":"Metric"} 2018-10-01T23:26:40.14-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf-47abe4dc6b20d063f500","event_type":"ValueMetric","health.check.cliCommand.logs":1,"ip":"","job":"healthwatch-forwarder","job_index":"09bc9061-de37-46f1-bdb9-7409a2e9c6b1","level":"info","msg":"","origin":"healthwatch","time":"2018-10-02T03:26:40Z","unit":"Metric"} 2018-10-01T23:26:40.14-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf-47abe4dc6b20d063f500","event_type":"ValueMetric","health.check.cliCommand.stop":1,"ip":"","job":"healthwatch-forwarder","job_index":"09bc9061-de37-46f1-bdb9-7409a2e9c6b1","level":"info","msg":"","origin":"healthwatch","time":"2018-10-02T03:26:40Z","unit":"Metric"} 2018-10-01T23:26:40.14-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.74","job":"router","job_index":"c2d0977d-7eb6-46ca-a37a-ba01b9f150bf","latency":13,"level":"info","msg":"","origin":"gorouter","time":"2018-10-02T03:26:40Z","unit":"ms"} 2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.73","job":"router","job_index":"79a95419-2bf9-4f69-be41-da1da70ffcc3","level":"info","msg":"","origin":"gorouter","time":"2018-10-02T03:26:40Z","unit":"seconds","uptime":35804} 2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.37","job":"diego_cell","job_index":"04c9251e-4555-4f96-8553-d66d852316ff","level":"info","msg":"","numGoRoutines":3,"origin":"grootfs","time":"2018-10-02T03:26:40Z","unit":"count"} 2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.37","job":"diego_cell","job_index":"04c9251e-4555-4f96-8553-d66d852316ff","level":"info","msg":"","numCPUS":4,"origin":"grootfs","time":"2018-10-02T03:26:40Z","unit":"count"} 2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.37","job":"diego_cell","job_index":"04c9251e-4555-4f96-8553-d66d852316ff","level":"info","memoryStats.numFrees":1481,"msg":"","origin":"grootfs","time":"2018-10-02T03:26:40Z","unit":"count"} 2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.37","job":"diego_cell","job_index":"04c9251e-4555-4f96-8553-d66d852316ff","level":"info","memoryStats.numBytesAllocatedStack":589824,"msg":"","origin":"grootfs","time":"2018-10-02T03:26:40Z","unit":"count"} 2018-10-01T23:26:39.77-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.72","job":"router","job_index":"dcea8fb8-5b51-4510-9873-fd1551c8d912","level":"info","msg":"","origin":"gorouter","route_lookup_time":3200,"time":"2018-10-02T03:26:39Z","unit":"ns"} 2018-10-01T23:26:39.77-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.74","job":"router","job_index":"c2d0977d-7eb6-46ca-a37a-ba01b9f150bf","latency":41,"level":"info","msg":"","origin":"gorouter","time":"2018-10-02T03:26:39Z","unit":"ms"} 2018-10-01T23:26:39.77-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.74","job":"router","job_index":"c2d0977d-7eb6-46ca-a37a-ba01b9f150bf","latency.CloudController":41,"level":"info","msg":"","origin":"gorouter","time":"2018-10-02T03:26:39Z","unit":"ms"} 2018-10-01T23:26:39.77-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.40","job":"windows_diego_cell","job_index":"7b377737-6019-4661-b95f-829aeed7381b","level":"info","memoryStats.lastGCPauseTimeNS":0,"msg":"","origin":"rep","time":"2018-10-02T03:26:39Z","unit":"ns"} ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR panic: runtime error: invalid memory address or nil pointer dereference^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x78dae4]^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR goroutine 45 [running]:^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).normaliseAndSaveEntityToDatabase(0xc0001323f0, 0x9e481d, 0x4, 0xc00036ec90, 0x24, 0x0, 0x0, 0x0)^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:145 +0x84^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).getEntity(0xc0001323f0, 0x9e481d, 0x4, 0xc00036ec30, 0x24, 0xc0003abd28, 0xc0003abd28, 0x4c3540)^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:121 +0x2b8^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(CacheLazyFill).GetApp(0xc0001323f0, 0xc00036ec30, 0x24, 0x1, 0x1, 0xc00036ec30)^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:245 +0x8a^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR git.platform.manulife.io/platformreliability/firehose-to-syslog/events.(Event).AnnotateWithAppData(0xc00050e390, 0xa82ec0, 0xc0001323f0)^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/events/events.go:129 +0x129^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR git.platform.manulife.io/platformreliability/firehose-to-syslog/eventRouting.(EventRoutingDefault).RouteEvent(0xc00027adc0, 0xc000514bd0)^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/eventRouting/eventrouting_default.go:68 +0x14e^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient.(FirehoseNozzle).ReadLogsBuffer(0xc0003d40e0, 0xa86c20, 0xc0002991c0)^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient/firehoseclient.go:119 +0x189^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR created by git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient.(*FirehoseNozzle).Start^[[0m ^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient/firehoseclient.go:77 +0xb4^[[0m 2018-10-01T23:26:39.81-0400 [APP/PROC/WEB/0] OUT Exit status 2 2018-10-01T23:26:39.82-0400 [CELL/SSHD/0] OUT Exit status 0 2018-10-01T23:26:39.86-0400 [CELL/0] OUT Cell 04c9251e-4555-4f96-8553-d66d852316ff stopping instance 234b126c-dce9-4754-69b6-ae62 2018-10-01T23:26:39.86-0400 [CELL/0] OUT Cell 04c9251e-4555-4f96-8553-d66d852316ff destroying container for instance 234b126c-dce9-4754-69b6-ae62 2018-10-01T23:26:40.15-0400 [CELL/0] OUT Cell 04c9251e-4555-4f96-8553-d66d852316ff successfully destroyed container for instance 234b126c-dce9-4754-69b6-ae62 2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.37","job":"diego_cell","job_index":"04c9251e-4555-4f96-8553-d66d852316ff","level":"info","memoryStats.lastGCPauseTimeNS":0,"msg":"","origin":"grootfs","time":"2018-10-02T03:26:40Z","unit":"count"} 2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","ImageDeletionTime":21269100,"cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.37","job":"diego_cell","job_index":"04c9251e-4555-4f96-8553-d66d852316ff","level":"info","msg":"","origin":"grootfs","time":"2018-10-02T03:26:40Z","unit":"nanos"} 2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.34","job":"diego_database","job_index":"6babf9d0-2623-40ae-943a-50fee1eca036","level":"info","msg":"","origin":"policy-server","time":"2018-10-02T03:26:40Z","totalPolicies":0,"unit":""} 2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.89","job":"cloud_controller","job_index":"16bad1d6-4d25-40b8-8116-32809fb17d70","level":"info","msg":"","origin":"cc","requests.outstanding":2,"time":"2018-10-02T03:26:40Z","unit":"counter"} 2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.89","job":"cloud_controller","job_index":"16bad1d6-4d25-40b8-8116-32809fb17d70","level":"info","msg":"","origin":"cc","requests.completed":2176833,"time":"2018-10-02T03:26:40Z","unit":"counter"} 2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","http_status.4XX":3708,"ip":"10.234.250.89","job":"cloud_controller","job_index":"16bad1d6-4d25-40b8-8116-32809fb17d70","level":"info","msg":"","origin":"cc","time":"2018-10-02T03:26:40Z","unit":"counter"} @ @

filidav commented 6 years ago

Any other suggestions?

aeijdenberg commented 6 years ago

@filidav - sorry for the delayed response, have been at an offsite without laptop earlier this week. Looking at the new stack trace, it seems to be hitting a slightly different code path. I've added a second commit to PR #205 to catch this and log an appropriate message.

Would be possible to rebuild with this patch and then look for log messages containing nil entity returned?

filidav commented 6 years ago

Added your changes

2018-10-03T20:26:06.820-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-04 00:26:06.819826368 +0000 UTC m=+4.128743401] Connected to Syslog Server! Connecting to Firehose... 2018-10-03T20:26:07.648-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-04 00:26:07.648295102 +0000 UTC m=+3.917315101] Cache filled. 2018-10-03T20:26:07.729-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-04 00:26:07.729628602 +0000 UTC m=+3.998649301] Connected to Syslog Server! Connecting to Firehose... 2018-10-03T22:43:39.978-04:00 [APP/PROC/WEB/1] [ERR] [2018-10-04 02:43:39.978537972 +0000 UTC m=+8257.287454505] Exception occurred! Message: Error while reading from the Firehose: Details: websocket: close 1006 (abnormal closure): unexpected EOF 2018-10-03T22:43:39.979-04:00 [APP/PROC/WEB/1] [ERR] [2018-10-04 02:43:39.979025572 +0000 UTC m=+8257.287942105] Exception occurred! Message: Error while reading from the Firehose: Details: websocket: close 1006 (abnormal closure): unexpected EOF

aeijdenberg commented 6 years ago

@filidav - I think it's unlikely the change would have introduced that particular error. Are there any instances of nil entity returned present in the logs?

(if I'm reading the above correctly, is it mixing messages from 2 different processes, WEB/0 and WEB/1 ?)

filidav commented 6 years ago

No other errors I noticed someone else reported the same error I just did as well.

https://github.com/cloudfoundry-community/firehose-to-syslog/issues/180

filidav commented 6 years ago

Any other suggestions on how we can trap for this condition Ever since we installed this fix I don't see the nil pointer. Are we now ignoring it?

filidav commented 6 years ago

Why would the API ever return a GUID for an app, but no entity record which would cause this panic error? Anyway we can get closer to this issue and identify how this is happening?

shinji62 commented 6 years ago

Why would the API ever return a GUID for an app, but no entity record which would cause this panic error? ==> We don't know, that the all problem here.

Ever since we installed this fix I don't see the nil pointer. Are we now ignoring it? ==> We don't ignore it, we trap the error and ignore the record not the error.

filidav commented 6 years ago

So how will we ever get to root cause?

David Filippelli | Platform Engineer, Platform Reliability Enterprise Technology & Services 250 Bloor Street East | Toronto, Ontario, M4W 1E5 | M. 416-527-3343 David_Filippelli@manulife.commailto:David_Filippelli@manulife.com | manulife.comhttp://www.manulife.com/

Have a question? Want to know how we deliver? Check out our Playbookhttps://mfc.sharepoint.com/sites/GEO/playbook and be a part of the community!

[cid:image001.png@01D37BEB.A96D7410]

From: Etourneau Gwenn notifications@github.com Reply-To: cloudfoundry-community/firehose-to-syslog reply@reply.github.com Date: Tuesday, October 9, 2018 at 9:50 PM To: cloudfoundry-community/firehose-to-syslog firehose-to-syslog@noreply.github.com Cc: David Filippelli David_Filippelli@manulife.com, Mention mention@noreply.github.com Subject: Re: [cloudfoundry-community/firehose-to-syslog] ERR panic: runtime error: invalid memory address or nil pointer dereference^ (#204)

Why would the API ever return a GUID for an app, but no entity record which would cause this panic error? ==> We don't know, that the all problem here.

Ever since we installed this fix I don't see the nil pointer. Are we now ignoring it? ==> We don't ignore it, we trap the error and ignore the record not the error.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cloudfoundry-community/firehose-to-syslog/issues/204#issuecomment-428410423, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AWJsOomev1pPgGjG4rSs7fj-n8r3nSLHks5ujVJKgaJpZM4W7Lpu.

STATEMENT OF CONFIDENTIALITY The information contained in this email message and any attachments may be confidential and legally privileged and is intended for the use of the addressee(s) only. If you are not an intended recipient, please: (1) notify me immediately by replying to this message; (2) do not use, disseminate, distribute or reproduce any part of the message or any attachment; and (3) destroy all copies of this message and any attachments.

shinji62 commented 6 years ago

btw, @aeijdenberg add more change to #205 did you include all of them ?

filidav commented 6 years ago

Yes all those changes are in.

aeijdenberg commented 6 years ago

The changes in #205 should stop the panics, but I would have expected that something would be catching and logging the error that is now returned. I'm surprised there isn't, but then again, I was surprised that the underlying API would ever have returned that type of data to begin with, so perhaps it was a transient error and has resolved itself?

Either way, I think #205 should be safe to merge.

elliottpolk commented 6 years ago

Just a side note, it is still technically possible to hit the nil pointer issue when calling normaliseAndSaveEntityToDatabase. I would suggest checking for nv == nil and throwing an error to ensure all bases are covered.

aeijdenberg commented 6 years ago

@elliottpolk - you're correct that if you pass nil to that method, there will be a panic. However a quick glance over the code shows this is true of nearly every single function / method definition in this and other Go packages. I think there's a fairly well known convention that one shouldn't pass nil as an argument to a function / method unless the documentation for that function explicitly calls out that it's OK to do so. I don't see a compelling reason to special case this particular method.

jgeorge300 commented 5 years ago

Has this been resolved? I've been seeing this issue (or something very similar) in my PCF environments. Firehose has been crashing on a regular basis. Occasionally, when PCF restarts firehose it fails to start processing the logs again. We've had to set a schedule to automatically restart firehose every couple hours.