Particular / ServiceControl

Backend for ServiceInsight and ServicePulse
https://docs.particular.net/servicecontrol/
Other
52 stars 47 forks source link

Support expiration of audit messages #116

Closed andreasohlund closed 10 years ago

andreasohlund commented 10 years ago

Use the expiration bundle to have raven auto expire audit messages.

Note: We're only expiring audit messages, not errors, heartbeats etc

Replaces https://github.com/Particular/ServicePulse/issues/64

indualagarsamy commented 10 years ago

You mean, https://github.com/Particular/NServiceBus/blob/develop/src/NServiceBus.Core/Config/AuditConfig.cs#L35

dannycohen commented 10 years ago

Based on our discussion: We'll implement this now and postpone implementing command-line tool (https://github.com/Particular/ServiceControl/issues/114) until more user requirements justify it.

dannycohen commented 10 years ago

@andreasohlund - can we set this to RC / v1.0 release ?

andreasohlund commented 10 years ago

Yes

Sent from my iPhone

On 18 nov 2013, at 12:17, Danny Cohen notifications@github.com wrote:

@andreasohlund - can we set this to RC / v1.0 release ?

— Reply to this email directly or view it on GitHub.

dannycohen commented 10 years ago

Milestone set to RC

indualagarsamy commented 10 years ago

adding some discussion points with @johnsimons:

  1. Do we even want to delete audit messages?
  2. Monitoring and storing audit data are two different concerns. Right now, we store these audit messages. As the database grows over time (as in the case where some of the messages need to be stored for 7 years for regulatory purposes), the performance of the monitoring tool might suffer. We would want to store the messages for a short amount of time (internal to service control) and maybe have a mechanism where audit messages after being processed are forwarded to a different concern which stores the message and applies the data archival policy 3.The expiration currently is a tag on the message. So, if we change the config setting to be something else, it wont apply to the messages that have already been created with the previous expiration setting. That is something we need to consider.

@johnsimons - please add, if I have missed out anything else based on our discussion.

johnsimons commented 10 years ago

SC is about providing real time data to SP + SI.

Storing customers audit data is not really the primary focus of SC. If SC had to store 7 years of audit data, I am quite sure we wouldn't be able to run an efficient and super fast monitoring backend.

We need to consider offloading this historical data somewhere else.

@andreasohlund @dannycohen Thoughts?

dannycohen commented 10 years ago

There are several requirements on issues of audit / data retention / online monitoring etc.

  1. SC's main focus (currently, for v1) is online monitoring for SP, and recent past activity display for SI
  2. In the future we would like to add additional features that will require more long term data (lets say on the order of weeks / months), e.g. messaging-oriented CEP, and basic trends analysis.
  3. Audit data retention policy for years is something that does not mean we need to keep all the data in SC database; we can offload it into a snail-paced external data store. This is a future scenario we will handle (can easily be a vNext concern)

Having said that, I would say that this issue, of supporting audit data expiration, is a nice-to-have: It goes half way in the direction of allowing users to periodically clean up some of the redundant stuff on their SC database (compared to the command-line functionality described in #114, setting expiration only supports future clean up policy and is a lot less of a powerful clean up tool for Opie or Archie)

We need more customer feedback on these issues in order define requirements we can be certain about.

JeffreyAllenMiller commented 10 years ago

Gentlemen, for what it is worth, I tend to fall in the same camp as Danny. We desparately need the real time insight (no pun intended) that these tools offer. Indu has some great points as well and while it is only my opinion, I think separating the two and providing two separate and distinct ways of managing the processed messed and errored messages is the way to go. Even it if it requires creating a another config setting that pushes processed messages to the audit queue for SC purposes and another one that pushes it to a Archive queue.

Regarding the automatic expiration of messages (i.e. Audit), my organization desparately would like to see this implemented. As we have killed our Dev server twice. We are pumping millions of records through and it is chewing up disk. We tried deleting the collection the other day and it was deleting about 1000 messages every 3 seconds which was unacceptable. I tried finding a way to delete the messages in the Silverlight IDE using a query that looks for the message before a specific sent date, but I was unsuccessful. I am sure this can be done in the Silverlight app, but regretabbly, I did not land on the right link to show me the way. Hence, last friday I started adding a page to our web portal do exactly that as this script seems to be all over the place on the web. But again this is a problem also because we currently cannot access SC from another server (which this bug is being addressed int he next release). Anyway, long story short, not all of your users are RavenDB experts, far from it, and having an automated solution for managing the number of messages in the database based on a given date will go a long way to easing the concerns of nervous managers and developers.

mauroservienti commented 10 years ago

Guys, a simple solution can be the RavenDB "expiration bundle" that can be configured to automatically delete documents older than a certain date. just an idea.

johnsimons commented 10 years ago

Hi @JeffreyAllenMiller We already doing the "archiving queue", see #183

Regarding auto expiration and the RavenDB "expiration bundle", it sounds like we need both auto expire and also prune based on quota. The later is harder to do because we need to calculate used space and prune older data. @synhershko do you know if Raven supports this?

synhershko commented 10 years ago

@johnsimons yes through the quotas bundle https://github.com/ayende/ravendb/tree/master/Raven.Database/Bundles/Quotas

Depending on the actual requirements, I'd recommend combining both to work based on the required logic. I wouldn't be surprised if having both running in parallel will cause bugs.

johnsimons commented 10 years ago

I wouldn't be surprised if having both running in parallel will cause bugs

Hmmm?

synhershko commented 10 years ago

Strike that, the quotas bundle isn't expiring, it just stops you from writing past the quota. You (or me) will have to roll your own based on expiration & quotas.

andreasohlund commented 10 years ago

@johnsimons there was a consensus to include this feature into 1.0.0

synhershko commented 10 years ago

The expiration module is about to be ready to be merged in. I need you guys to decide on which messages exactly to expire and delete.

This is what I have currently:

There may be subtleties like message statuses, message types, IsSystemMessage etc that need to be accounted for. /cc @johnsimons @andreasohlund @dannycohen

dannycohen commented 10 years ago

@synhershko - thanks!

  1. If you can easily add ResolvedSuccessfully that would be nice, but optional.
  2. Don't think running every minute is required; can be event every hour. Any reason why you recommend every minute ? to reduce deletion peaks of activity ?
  3. Please specify how the configuration is defined (so it can be added to Docs (see https://github.com/Particular/docs.particular.net/issues/123)
synhershko commented 10 years ago
  1. I could - just made sure it makes sense. I also would want to make sure we don't over-delete or leave msgs behind so please make sure I didn't miss anything wrt types, system messages etc.
  2. Yes, to reduce peaks. If its a small installation it will go unnoticed anyway, and on large ones it would just prevent hiccups and long staleness periods.
  3. Will do once it has been reviewed, approved and merged in.
dannycohen commented 10 years ago

@synhershko - thanks!

synhershko commented 10 years ago

We now support expiration. Writing down the configurations stuff now.