Closed chrono closed 8 years ago
This is a fairly reasonable request. However, may I suggest that the implementation would depend on the number of records in the database, rather than restricting available data to whatever interval? I suppose I could make the script to check the number of records in the database every 24 hours and remove anything that is over the user specified limit. I would like to know your opinion about my suggested approach.
You are right to guess that data normalisation improved space savings. 250k requests use less than 1GB data, which is 15 times less than the @preinheimer implementation.
+1
However, may I suggest that the implementation would depend on the number of records in the database
I would like to see an option to make this configurable: e.g. 1) time based: retention period 4 weeks, 2) size based: number of records. whichever limit is reached is first, is applied (analog to logrotate)
furthermore it would come handy to be able to delete specific host/URIs/Request either via the web interface or via a cli script (in particular i want to be able to delete a host and all related data)
with the current schema the sql would look like
DELETE FROM
calls, requests
USING
calls, requests
WHERE
calls.request_id = requests.id
AND requests.request_timestamp<= DATE_SUB(CURRENT_TIMESTAMP, INTERVAL 7 DAY)
Query is not the problem. Agreeing on the clean up strategy is. I personally do not want to lose any of the data, even if those are one year old records. This setting could be included in the main configuration, e.g.
<?php ['cleanup' => [ ['host' => '*.anuary.com', 'age' => 3600] ]] // purge any data about *.anuary.com host older than 1 hour
justed posted this sql, because the origin of this issue isn't correct anymore.
If you want me to include this cleanup strategy thing into the main config I could provide an PR for that..
I don't know what a PR in this context is.
a Pull Request
ha. Sorry, tired. : ) Sure. If you have time and will, go ahead. It will be much appreciated.
Just an update from my end: I'm no longer working in PHP land so this does not concern me anymore. Since the PR didn't see any activity in the last few years, I'm going to close this off.
I'm currently running xhprof/preinheimer in our staging environment and on one of the production servers, logging a fraction of all requests. In order to maintain a reasonable dataset size, i do a simple timestamp based cleanup after 1 week.
the cleanup script looks like this:
and runs every minute. keeps the table nice and clean. If that would make for a too chunky deletion, I could run mk-archiver to ensure the delete calls are bite-sized. this is assuming that there is an index on timestamp, of course.
Just some data points: ~400k rows, avg row length 38kb, total dataset size ~15Gb. This is for 1 week of data.
While the normalized schema may give some space savings over the denormalized schema, it makes cleanup way harder. Therefore, the project should supply such a cleanup script.