ats1999 / WatchMan

An query and alerting engine...!
MIT License
0 stars 0 forks source link

Sampling? #6

Open ats1999 opened 8 months ago

ats1999 commented 8 months ago

When dealing with huge amounts of data, users may want to sample the data instead of keeping accurate data. This will enable users to save the amount of space required and the cost of hosting of course.

  1. Server-Side Sampling - Users will send events to the server and the server will decide if events need to be stored or not.
  2. Client-Side Sampling - Clients will send sampled data, the server does not need to sample it.
  3. Query-Side Sampling - Store 100% of data but execute the query on only a percentage of data.

Plotting Sampled Data to 100%

Suppose, we are dealing with a page view event that has to be sampled to 5%

So, the user will send only 5 events for 100 events received. But while querying user always needs to see 100% of their data. For e.g, users want to count how many page view events occurred. Although we have only 5 events, due to 5% sampling we'll show the users that 100 events occurred.

Change of sampling percentage

Users may change the sampling percentage in between. For example, the user has to have a sampling % of 5% for 1st month, 10% for 2nd month, 8% for 3rd month etc.

Use event-driven architecture to update existing data.