gajus / xhprof.io

GUI to analyze the profiling data collected using XHProf – A Hierarchical Profiler for PHP.
http://xhprof.io/
Other
429 stars 103 forks source link

missing cleanup strategy #11

Closed chrono closed 8 years ago

chrono commented 11 years ago

I'm currently running xhprof/preinheimer in our staging environment and on one of the production servers, logging a fraction of all requests. In order to maintain a reasonable dataset size, i do a simple timestamp based cleanup after 1 week.

the cleanup script looks like this:

DELETE FROM details WHERE timestamp <= DATE_SUB(CURRENT_TIMESTAMP, INTERVAL 7 DAY)

and runs every minute. keeps the table nice and clean. If that would make for a too chunky deletion, I could run mk-archiver to ensure the delete calls are bite-sized. this is assuming that there is an index on timestamp, of course.

Just some data points: ~400k rows, avg row length 38kb, total dataset size ~15Gb. This is for 1 week of data.

While the normalized schema may give some space savings over the denormalized schema, it makes cleanup way harder. Therefore, the project should supply such a cleanup script.

gajus commented 11 years ago

This is a fairly reasonable request. However, may I suggest that the implementation would depend on the number of records in the database, rather than restricting available data to whatever interval? I suppose I could make the script to check the number of records in the database every 24 hours and remove anything that is over the user specified limit. I would like to know your opinion about my suggested approach.

You are right to guess that data normalisation improved space savings. 250k requests use less than 1GB data, which is 15 times less than the @preinheimer implementation.

marzizzle commented 11 years ago

+1

However, may I suggest that the implementation would depend on the number of records in the database

I would like to see an option to make this configurable: e.g. 1) time based: retention period 4 weeks, 2) size based: number of records. whichever limit is reached is first, is applied (analog to logrotate)

furthermore it would come handy to be able to delete specific host/URIs/Request either via the web interface or via a cli script (in particular i want to be able to delete a host and all related data)

staabm commented 11 years ago

with the current schema the sql would look like

DELETE FROM 
calls, requests
USING
calls, requests
WHERE 
calls.request_id = requests.id
AND requests.request_timestamp<= DATE_SUB(CURRENT_TIMESTAMP, INTERVAL 7 DAY)
gajus commented 11 years ago

Query is not the problem. Agreeing on the clean up strategy is. I personally do not want to lose any of the data, even if those are one year old records. This setting could be included in the main configuration, e.g.

<?php ['cleanup' => [ ['host' => '*.anuary.com', 'age' => 3600] ]] // purge any data about *.anuary.com host older than 1 hour
staabm commented 11 years ago

justed posted this sql, because the origin of this issue isn't correct anymore.

If you want me to include this cleanup strategy thing into the main config I could provide an PR for that..

gajus commented 11 years ago

I don't know what a PR in this context is.

staabm commented 11 years ago

a Pull Request

gajus commented 11 years ago

ha. Sorry, tired. : ) Sure. If you have time and will, go ahead. It will be much appreciated.

chrono commented 8 years ago

Just an update from my end: I'm no longer working in PHP land so this does not concern me anymore. Since the PR didn't see any activity in the last few years, I'm going to close this off.