locker core should be (optionally) instrumented

othiym23 commented 12 years ago

To better understand the behavior of the locker over time, it would be extremely useful to have some fine-grained instrumentation of many of the events inside the locker:

synclet starts
synclet stops
synclet sends event
collection receives event
collection processes event
datastore loads object
datastore saves object
locker starts a process
locker stops a process

Events for collections and synclets should include the collection or synclet name so we can do some correlation of load based on events.

To track these statistics, we need a small library that knows how (and whether) to send events to an events-gathering daemon (statsd for now), and to modify the various pieces of locker core to use said library to emit events. @temas began work on this library that I'll be reusing and modifying.

The emitter functions in the instrumentation library will be synchronous for simplicity's sake – sending UDP packets is both nonblocking and low-latency. On the receiving end, we'll see how well statsd deals with the level of traffic, and may need to put it on its own host / partition the account servers so that we don't saturate its bandwidth. For now, graphic.test.singly.com will do until we have evidence that it's dropping packets.

This work will be done on the instrument branch.

mdz commented 12 years ago

collection starts/stops

quartzjer commented 12 years ago

+1, looks like a pretty comprehensive start!

Might also add a set for http calls through core, both the /Me/* proxied core/synclet ones as well...

temas commented 12 years ago

My gut tells me we could quickly overinstrument and get bogged down in trying to correlate data, but I guess it's better to be able to pull out than add in and miss something.

Also, be cognizant of the analytics opt-in.

othiym23 commented 12 years ago

@temas, that's a good point that ties to something else I've been wondering about:

what's the best way to uniquely identify a locker from within itself?
what's the easiest way for a locker to identify whether it's opted in to the analytics gathering?

I'd argue that with the correct answer to #1 (i.e. come up with a unique, unvarying identifier that still isn't tied to any public piece of information) #2 becomes semi-moot, but if we do want to filter out this data for account holders who haven't opted in, determining whether an event should be fired needs to be as low-overhead as possible, so if it involves pulling information from Integral, that needs to happen once per startup.

mdz commented 12 years ago

In order to implement pager notifications in the locker wrapper script, I was planning to pass in the locker name as an environment variable, FWIW

quartzjer commented 12 years ago

absolutely, Me/key.pub, and ideally the sha of that for shorthand/references everywhere (including the dht)
it'd be great to have that stored in the locker as the master (along with auth :)

othiym23 commented 12 years ago

@quartzjer I'm fine with the idea of using some or all of Me/key.pub, but I do have a question as to whether that's too identifying, especially as we start to use the key for other things.

I'll just put this out there: this information isn't traditional analytics data. I think it should be fine to gather for all hosted accounts, and really, to be useful, it needs to be gathered for all of them. Is there anyone who believes strongly it should be disabled for people who have opted out of the analytics?

quartzjer commented 12 years ago

The public key is designed+designated to be public, so it should be good :)

We have to collect stats externally on processes regardless, I agree this is much more like that, IMO the opt in is about tracking someone's personal actions/activities/etc.

On Feb 27, 2012, at 4:40 PM, Forrest L Norvellreply@reply.github.com wrote:

@quartzjer I'm fine with the idea of using some or all of Me/key.pub, but I do have a question as to whether that's too identifying, especially as we start to use the key for other things.

I'll just put this out there: this information isn't traditional analytics data. I think it should be fine to gather for all hosted accounts, and really, to be useful, it needs to be gathered for all of them. Is there anyone who believes strongly it should be disabled for people who have opted out of the analytics?

Reply to this email directly or view it on GitHub: https://github.com/LockerProject/Locker/issues/877#issuecomment-4204979

kristjan commented 12 years ago

I agree that these kind of system statistics relate more to infrastructure visibility than to personal data and can be collected regardless of optin. If we can't see what our servers are doing, we can't anticipate or respond to outages.

temas commented 12 years ago

Is this done enough to close and move to more granular issues as they come up?

othiym23 commented 12 years ago

It is, I think. There's still room for instrumentation on things like Express / Connect (for things like user responsiveness of the system), but I'd rather have us do targeted additions as we come up with specific issues we want to measure.

LockerProject / Locker

locker core should be (optionally) instrumented #877