DATA-DOG / DataDogAuditBundle

Stores all database changes for doctrine ORM
MIT License
135 stars 63 forks source link

Large database question #32

Open DamienHarper opened 6 years ago

DamienHarper commented 6 years ago

Hi, We're currently building a new large project with a relatively large database (around 200 tables). We need to keep track of changes done on mostly all tables and be able to show diffs through the UI. This bundle seems to fit our needs but, if I'm not mistaken, it stores changes in only two tables. So, with a lot of users, I fear that we'll quickly have two gigantic tables with billions of records which will make them slow to lookup or to query (to show diff in the UI) and as a consequence will be harder to backup (and restore). So, is it possible to split audit logs into more tables (one audit table per audited table for example as simplethings/entity-audit-bundle does) ? If no, do you have some feedback about similar use cases?

l3pp4rd commented 6 years ago

Hi, I'm thinking, that splitting it into the tables, would make unnecessary complexity. Basically this bundle consists only with one core class, the audit event subscriber and it is quite simple to fork and adapt to custom needs.

In general, regarding table sizes, I agree that this will be an issue and in my opinion it should be archived or garbage collected. I would be happy to accept a contribution, or create one later when I have time: which implements a symfony command in this bundle, which would produce only the latest snapshot of each logged entity and removes all old entries. In order to produce the latest snapshot, we could take all entities logged load them from db and create their snapshots, then remove all older entries or archive them somewhere as a command option.

Building separate tables for each entity, would result in the same issue as without split, since in most cases there will be few highly used tables

l3pp4rd commented 6 years ago

though, my described option, might not work well if there are more inserts than updates. but for example, records, having lifecycle and being removed later could also be archived. Anyway, that is a complex task and hardly there is one best option, most likely it is not feasible to implement something like that in the library at all.

DamienHarper commented 6 years ago

Thanks for your feedback.

l3pp4rd commented 6 years ago

but indeed, the lookup in large table will be terrible, since association foreign key is stored in string format, will cause a cast to int and it will be too slow even with more than 100K rows. For such cases, I just can suggest to copy the source and adapt it, changing FK to int already would speed it up, but that will not be enough.

DamienHarper commented 6 years ago

Well, I finally chose to develop my own bundle to fit my needs/context: https://github.com/DamienHarper/DoctrineAuditBundle It relies on one audit table per audited one and provides a command to clean old entries. Thanks for your bundle, it was an inspiration for me.

l3pp4rd commented 6 years ago

glad to hear that, cheers

ghost commented 5 years ago

I realize this is an older ticket, but I wanted to mention that most RDBMS' have a feature called table partitioning. You can partition on the type to achieve one (internal) "table" per type and partition on things such as dates to avoid issues with very large tables.

This lets you have one actual table in the database, which actually consists of many tables transparently.

Doctrine doesn't support this natively, however, but it would likely work with a partitioned table that you create yourself out of the box as it happens transparently by the database and it's "just another table" to Doctrine.

You also need to ensure that when you fetch records, you add additional WHERE conditions based on the partition condition so the database can exclude certain partitions entirely (thus, improving performance).

DamienHarper commented 5 years ago

@tominventisbe sure partitioning can help here! but as you said, there are a few drawbacks:

Having separate audit tables does not completely solve these problems but pushes the limits further:

ghost commented 5 years ago

Indeed. I just thought I'd mention it for anyone looking for an intermediate solution, but didn't intend to disprove the need for the original request.

In case anyone's interested and as mentioned by Damien, several RDBMS' support partitions, such as MySQL and PostgreSQL.

PostgreSQL 11 added some additional advanced features such as default partitions for records that don't fit in any of the existing ones. (At least) with PostgreSQL, you can:

You will still need to do some plumbing in your code to create new partitions, either manually or automatically and you will likely still suffer from locks in this case.

Hope this helps someone!

DamienHarper commented 5 years ago

@tominventisbe I understand that you didn't intend to disprove the need for the original request, I hope I didn't offend you (wasn't my goal for sure). Have a nice Easter weekend.

Gabr-bb commented 2 years ago

regarding this , is there a way to force the logger to log into a different database just before the data is inserted?

DamienHarper commented 2 years ago

@Gabr-bb I don't know if this is possible with this bundle as of now (wasn't possible when I used it 2.5 years ago).

As I mentioned in a previous reply, this bundle has been an inspiration for me and I built my own audit library and bundle that supports multiple storage, it might fit your needs in case this bundle doesn't.

Gabr-bb commented 2 years ago

that looks promising, can i use your library instead of this one and it would work the same without changing database structure or lose the old data?

DamienHarper commented 2 years ago

No it won't use the same table to store audit data. Have a look to the documentation.

chancegarcia commented 2 years ago

@Gabr-bb I don't know if this is possible with this bundle as of now (wasn't possible when I used it 2.5 years ago).

As I mentioned in a previous reply, this bundle has been an inspiration for me and I built my own audit library and bundle that supports multiple storage, it might fit your needs in case this bundle doesn't.

thanks for making that bundle. we switched form using this one to yours last year and it has been awesome.

DamienHarper commented 2 years ago

@Gabr-bb I don't know if this is possible with this bundle as of now (wasn't possible when I used it 2.5 years ago).

Wow, I realize this happened Almost 5 years ago, time passes too quickly 😅

indjeto commented 9 months ago

I added a command in v1.0, that deletes the old records. bin/console audit-logs:delete-old-logs --retention-period=P6M

It does it on batches of 10k and sleep interval to not block the db server.