elastic / elastic-agent-shipper

Data shipper for the Elastic Agent - single, unified way to add monitoring for logs, metrics, and other types of data to a host.
Other
9 stars 19 forks source link

Refactor monitoring to make the shipper more beats-like, and compatible with Agent #289

Closed fearful-symmetry closed 1 year ago

fearful-symmetry commented 1 year ago

What does this PR do?

Part of fix for https://github.com/elastic/elastic-agent-shipper/issues/267

This accomplishes a few things:

Note that this currently won't work with agent, as changes on the agent side are needed to make agent scrape metrics from the shipper.

Why is it important?

Needed to integrate the shipper with other agent monitoring

Checklist

Author's Checklist

How to test this PR locally

mergify[bot] commented 1 year ago

This pull request does not have a backport label. If this is a bug or security fix, could you label this PR @fearful-symmetry? 🙏. For such, you'll need to label your PR with:

To fixup this pull request, you need to add the backport labels for the needed branches, such as:

elasticmachine commented 1 year ago

:green_heart: Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

#### Build stats * Start Time: 2023-03-30T23:39:17.608+0000 * Duration: 19 min 45 sec

:grey_exclamation: Flaky test report

No test was executed to be analysed.

:robot: GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with: - `/test` : Re-trigger the build.

fearful-symmetry commented 1 year ago

Whoop, go linter doesn't like my go.mod replace line for https://github.com/elastic/elastic-agent-system-metrics/pull/80

fearful-symmetry commented 1 year ago

Okay, just noticed a major problem with this while testing. The shipper is built on the assumption that any of its sub-components can be arbitrarily stopped and start, but the monitoring code used is mostly global, and will usually panic if you try to re-register or update something. Will start poking at a fix.

fearful-symmetry commented 1 year ago

@leehinman you've run into this before on windows, right?

error creating API for registry reporter: parse "unix://C:\\Users\\jenkins\\AppData\\Local\\Temp\\mon1680036856.sock": invalid port ":\\Users\\jenkins\\AppData\\Local\\Temp\\mon1680036856.sock" after host
leehinman commented 1 year ago

@leehinman you've run into this before on windows, right?

error creating API for registry reporter: parse "unix://C:\\Users\\jenkins\\AppData\\Local\\Temp\\mon1680036856.sock": invalid port ":\\Users\\jenkins\\AppData\\Local\\Temp\\mon1680036856.sock" after host

yeah, that looks a lot like what was happening with the grpc socket. On windows you need to use a named pipe.

"github.com/elastic/elastic-agent-libs/api/npipe

fearful-symmetry commented 1 year ago

Alright, lets see what this does...

fearful-symmetry commented 1 year ago

/test

fearful-symmetry commented 1 year ago

Kinda baffled by the CI error:

[2023-03-29T00:14:27.444Z]    ⨯ build failed after 84.06s error=failed to build for darwin_arm64: exit status 2: # github.com/elastic/elastic-agent-shipper/monitoring

[2023-03-29T00:14:27.444Z] monitoring/queuemon.go:216:9: undefined: report.SetupInfoUserMetrics

tried building locally on an ARM macbook, works fine.

fearful-symmetry commented 1 year ago

Alright, can reproduce it with mage package on a Macbook:

   ⨯ build failed after 19.73s error=failed to build for darwin_arm64: exit status 2: # github.com/elastic/elastic-agent-shipper/monitoring
monitoring/queuemon.go:216:9: undefined: report.SetupInfoUserMetrics
fearful-symmetry commented 1 year ago

My first thought is that it was a cgo issue, but if that was the case, it would have failed on SetupMetrics first, and not SetupInfoUserMetrics.

ycombinator commented 1 year ago

@fearful-symmetry Shot in the dark, but could the build tags here have something to do with the error?

https://github.com/elastic/elastic-agent-system-metrics/blob/bea3acfa41eca45dfb07d3a7624dfc04baa2012a/report/report.go#L18-L19

ycombinator commented 1 year ago

My first thought is that it was a cgo issue, but if that was the case, it would have failed on SetupMetrics first, and not SetupInfoUserMetrics.

Not necessarily, since SetupMetrics is defined twice, once in setup.go and once in setup_other.go, each file having complementary build tags. However, SetupInfoUserMetrics is only defined in setup.go.

fearful-symmetry commented 1 year ago

Not necessarily, since SetupMetrics is defined twice, once in setup.go and once in setup_other.go, each file having complementary build tags.

@ycombinator argh, I didn't even notice there was a setup_other.go

fearful-symmetry commented 1 year ago

Put in https://github.com/elastic/elastic-agent-system-metrics/pull/82 since that code doesn't require cgo.

However, that raises another question, should mage package have cgo disabled?

fearful-symmetry commented 1 year ago

/test