audreyt / ethercalc

Node.js port of Multi-user SocialCalc
https://ethercalc.net
Other
2.97k stars 540 forks source link

Safe to delete or empty the audit trail? #494

Open atifaziz opened 7 years ago

atifaziz commented 7 years ago

Is it (operationally) safe to delete or empty out the audit trail for a given sheet when using either Reids or dump.json as the database.


PS I know I could look through the sources so sorry for being lazy and asking and hoping that the people familiar with the code base would be able to answer it in a jiffy.

atifaziz commented 7 years ago

I conducted a test where I reset the audit entries back to an empty array and EtherCalc appears to be working just fine. Still, would feel a bit more comfortable with an official, “Yeah, it's harmless.” 🙏 And if so, then a natural follow-up question would be, is the audit there just as a mere log? If yes, then is there a way to turn it off?


PS Searching the repo for “audit” doesn't yield anything interesting so I'm inclined to think it's a mere log.

eddyparkinson commented 7 years ago

I think the safe way would be to delete after the creation of the snapshot. Also if the audit is not used to create the snapshot, then should be find to delete.

Maybe the audit is just a debug tool, I don't know.

atifaziz commented 7 years ago

@eddyparkinson Sheets used daily over a long period can generate a large audit trail. If you're using dump.json, not only does the file become large and takes time to persist but the log remains committed to memory (well over a gigabyte of committed memory for a couple hundred megabytes of dump.json file size). It's a little less dramatic when using Redis because the audit entires are pushed out from application memory to Redis but then you run the risk of eventually hitting the configured max memory limit on Redis. It's a lot easier, however, to empty the audit lists with Redis than with dump.json. The latter can cause locking contention between the EtherCalc process and another one purging audits on occasion. I'm surprised no one else seems to have run into this issue. Have you?

eddyparkinson commented 7 years ago

Is the audit not per session? it says

Audit Trail This Session:

I auto restart the server every 24 hours. Not sure if this removes old audit data. But I did notice that if the server is left to run for many days it will crash, maybe because of the audit?

atifaziz commented 7 years ago

Is the audit not per session?

My hunch says that's only for the client and for informational purposes for power users. It has nothing to do with the audit trail accumulating indefinitely on the server.

Not sure if this removes old audit data.

Don't think it does.

if the server is left to run for many days it will crash, maybe because of the audit?

I didn't have a crash but increasing delays, timeouts and huge amount of memory being consumed. Are you using Redis as your database? My experiments with that showed once the max memory limit is reached, Redis errors are dropped silently (on callbacks into EtherCalc) and snapshots are no longer saved without any warning. The user can go on happily only to find out the changes completely gone when they return in another session.

eddyparkinson commented 7 years ago

@audreyt have you comments on this.

Think we need more information to workout what the problem is.

I do know the client is replicated on the server. This might cause an issue if the session never ends on the server.

bkuehnle commented 6 years ago

I'd like to add a bump to this. I have over 10K audit keys in Redis, I delete them on my dev server with no negative impact. As @eddyparkinson mentioned, it might not be safe to do this on production server with clients active.

eddyparkinson commented 6 years ago

@bkuehnle pull requests are welcome, happy to support a fix.

bkuehnle commented 6 years ago

@eddyparkinson I understand where you're coming from, but I'm don't know that it requires a fix. I'm trying to determine if the Redis audit records are essential for data consistency or not. Perhaps they exist simply to provide an audit trail if the user wishes to utilize it. Any idea?

eddyparkinson commented 6 years ago

Everything I have seen in the code suggests "saved Redis audit records" are never used by the spreadsheet.

One way to test would be to remove all ".rpush("audit-" + room_data, ...)" from main.js etc ie. https://github.com/audreyt/ethercalc/blob/master/main.js#L775 https://github.com/audreyt/ethercalc/blob/master/main.js#L738 https://github.com/audreyt/ethercalc/blob/master/main.js#L605

So it no longer saves the audit

There does not look to be code that reads the audit.

There is also rpush("log-" and "snapshot-" -- the snapshot is needed. I am not sure about log

harryjoshi commented 5 years ago

I've routinely remove audit trails after a few weeks with a heavy used spreadsheet. Totally safe to do so!