ep1cman / unifi-protect-backup

Python tool to backup unifi event clips in realtime
MIT License
577 stars 25 forks source link

Change Purge's execution interval from 60s to 1d #68

Closed bh1cqx closed 1 year ago

bh1cqx commented 1 year ago

It was apparently intended to run purge 'every midnight', i.e. once per day. However the sleep interval after each purge was set to 60 seconds. This commit changes it to 1 day (86,400 seconds) as a quick fix.

See https://github.com/ep1cman/unifi-protect-backup/blob/ca455ebcd0350f58a6f331931fc810648d3ac697/unifi_protect_backup/unifi_protect_backup.py#L203

bh1cqx commented 1 year ago

Not sure if @ep1cman will eventually allow customization of the interval through a parameter, but once per day seems good enough in most cases. Once per minute causes too much API usage with B2 as the cloud storage because it calls rclone rmdirs whenever one or more videos have been deleted in the past minute, as opposed to one rmdirs call at most once per day..

ep1cman commented 1 year ago

Hi,

That comment is left over from pre v8.0.0, where the purge happened once every night and that was it. Now instead the backed up events are stored in a database and they are only removed if they fall outside of the retention period, and this check runs once every 60s. If any videos were deleted them you are correct rmdirs is called to clean up any empty directories.

Could you explain more why this is an issue for you? I think the correct solution is to add it as a command line option.

bh1cqx commented 1 year ago

If I understand correctly the code path will call rclone rmdirs -vv --ignore-errors --leave-root "{base_dir_path}" every 60s for every single iteration of the loop:

https://github.com/ep1cman/unifi-protect-backup/blob/ca455ebcd0350f58a6f331931fc810648d3ac697/unifi_protect_backup/purge.py#L69

https://github.com/ep1cman/unifi-protect-backup/blob/ca455ebcd0350f58a6f331931fc810648d3ac697/unifi_protect_backup/purge.py#L27

I think rclone rmdirs calls b2_list_file_names on Backblaze (I guess it calls equivalent APIs on other platforms but have not tried/checked), for every single sub-directory under the root. (Rmdirs calls walk then eventually the function below, which appears to be traversal that calls list on each node):

https://github.com/rclone/rclone/blob/617c5d5e1b4c94f9c7f3c7ab72cc73898999ae58/fs/walk/walk.go#L366

This API has a pricing of $0.004 per 1000. Therefore, assuming the directory hierarchy has 100 subdirectories (i have 11 cameras) each day it calls approx. 60*24*100/1000*0.004=$0.576 or ~$17/month. This, IMHO, is unnecessary since the extra storage costs less if we don't try to clean up aggressively.

Please correct me if I'm wrong, esp. I didn't spend enough time to dive deep into rclone's code. However I don't think upload/delete operations will invoke list either. My Backblaze report shows ~153k calls in the past 5 days which seems consistent with the analysis above.

Adding it as a command line option makes sense, though I wonder if there's enough value (for the added complexity) to purge that often.

Edit: rclone rmdirs is only called if an uploaded file has been deleted in the past 60s, so my calculation is inaccurate (should be less than above). However i think the idea is still the same.

ep1cman commented 1 year ago

Thanks for that detailed explanation, that cost is certainly unacceptable and not something I had considered because I only use the OneDrive backend of rclone.

I will work on a fix for this once Christmas is over.

My thinking is to add a command line option (not a lot of effort) but return the default to once every 24h, this should be a much more acceptable frequency but allow anyone who needs stricter purging the ability to adjust it.

bh1cqx commented 1 year ago

That's awesome. Thanks!

ep1cman commented 1 year ago

I just pushed out a new release, the CI should release it soon. Please let me know if this addresses the underlying issue to your satisfaction