CroatianMeteorNetwork / RMS

RPi Meteor Station
https://globalmeteornetwork.org/
GNU General Public License v3.0
169 stars 47 forks source link

Add automated purge of ArchivedFiles #250

Closed markmac99 closed 5 months ago

markmac99 commented 6 months ago

Closes #243 Purges older folders and bz2 files in ArchivedFiles as part of the existing routine to free space before each run.

The logic is as follows

If these defaults are undesirable:

Example: setting the below would keep ten folders and fifteen bz2 files

# keep this many ArchDirs. Zero means keep them all
self.arch_dirs_to_keep = 10
# keep this many compressed ArchDirs. Zero means keep them all
self.bz2_files_to_keep = 15
g7gpr commented 6 months ago
for cam in /home/au*; do echo $(basename $cam); sudo su $(basename $cam) -c "ls ~/RMS_data/ArchivedFiles/ | wc -l; ls ~/RMS_data/CapturedFiles/ | wc -l"; done

       C  A
au0028 60 31
au0029 60 31
au002a 61 31
au002b 60 31
au002c 60 31
au002d 60 31
au002e 60 31
au002f 60 31

This is the initial state, I'll report back tomorrow.

g7gpr commented 5 months ago
       bz2 /. C
au0028 5 25 34
au0029 5 25 34
au002a 5 25 34
au002b 5 25 34
au002c 5 25 34
au002d 5 25 34
au002e 5 25 34
au002f 5 25 34

perfect 5 .bz2, 25 items in total - so 20 directories, 34 Captured directories

dvida commented 5 months ago

One final thing - could you add these options to the default config file?

markmac99 commented 5 months ago

I can do, though i was keeping them out of it after that long email exchange wth Pete and Milan :)

I'll post something in the group later today.

markmac99 commented 5 months ago

@g7gpr David is this a multi-camera build?

dvida commented 5 months ago

Thanks - it should definitely be added for transparency going forward. Adding new params to old config files was never really an issue. I would personally love to have some sort of a config file editor that shows the diff between the current and the latest default config file, so you can always see what changed and what's missing. Has anyone tried installing meld on the Pi?

markmac99 commented 5 months ago

Done. I also added logdays_to_keep since that was added in RMS a long time ago, but i didn't add it to the config file in that pull request. Hope its ok to add it now.

dvida commented 5 months ago

Thanks - looking at the numbers, wouldn't it make more sense to keep more archived bz2s instead of ArchivedFiles folders? E.g. we can keep 5 uncompressed dirs and a month's worth of bz2s. Or even less uncompressed folders. If the idea is that the archived data survives as long as possible on the Pi without deleting it, the bz2 is the way to go. E.g. imagine if for some reason the server is down for a long time (a week or two), we don't want that data to go anywhere.

markmac99 commented 5 months ago

I was keen to keep the folders to allow camera operators to browse data, run Confirmation etc. Even if you do confirmation on a desktop PC, the bz2 file isn't Windows-friendly and i have found that uncompressing them is a challenge for many of our camera operators. There are some tools in the Microsoft store but they're kinda sucky and surprisingly huge downloads (like, 200MB! !).

I could set the default to 20 for bz2 too. This would cost a bit of extra space, probably 1GB, but I don't think it would be a problem. I am currently keeping 10 bz2 files on UK0006 and freespace has never dropped below 4.5GB. The main thing is to prevent ArchivedDirs growing inefinitely.

g7gpr commented 5 months ago

Yes, it is multi camera, but not the standard build. It is non GUI, one camera per user.

g7gpr commented 5 months ago

This is so good, I wish it covered the Captured Files directory.

I think the comment in line https://github.com/g7gpr/RMS/blob/b0a914ba8af8773706021f3ef3215647e9f597ce/RMS/StartCapture.py#L630C72-L630C72 is wrong. However, is there any mechanism where RMS could automatically start regenerating ArchivedDirs if the matching CapturedFiles directory still exists?

markmac99 commented 5 months ago

RMS will automatically process any CapturedFiles folders that weren't previously processed or were only partially processed. However, other than manual deletion i can't think of a situation where there would be a CapturedFile folder but no ArchivedFiles folder.

g7gpr commented 5 months ago

I created that situation on a0028->2f. There is a reasonably sized drive on that machine, there are now more directories in CapturedFiles than directories in ArchivedFiles. It is not trying to re-create the deleted directories in ArchivedFiles.

markmac99 commented 5 months ago

Yes, thats correct behaviour. The ArchivedFiles folders are only created if RMS considers the source folder unprocessed or incompletely processed. Once its been processed and uploaded to GMN, RMS considers it "job done".

If we want to reprocess a folder, we can run RMS.Reprocess but I'm not sure it'd be safe to attempt to automate this as it could impact and delay RMS's morning upload process. Also, it would only be required in edge-cases such as on a linux box with very large storage, where over purging had taken place and the better fix would be to set the config parameters to retain more data, and to manually recreate what was required.

g7gpr commented 5 months ago

I agree, it is not the desired behaviour but I think the comment on line 630 of start capture needs fixing to match the actual behaviour.

markmac99 commented 5 months ago

I see - changed to say "Check if there are any unprocessed or incompletely processed captured dirs"

g7gpr commented 5 months ago

OK. No more comments from me. Thanks, this is a great new feature.

dvida commented 5 months ago

Hi Mark, Thanks - I suggest we up the bz2 period to at least 2 weeks. There are some cameras which upload intermittently (some uploaded yesterday data from late December), so we should prevent any data loss if possible. 20 days is a good number - what do you think about 30 days, would that be too long?

markmac99 commented 5 months ago

30 days won't work. I have been running a Unix script to do the same thing for a couple of years now, and have tried various limits but i have found that if i set it to retain more than 20 days of data in ArchivedFiles then the system consistently ran out of space in December. Even with 20 days set, my systems hit 96% every two weeks during December and January, and one system ran out of space completely during the overnight run. I think that on that evening, there was a lot of fast-moving cloud that generated hundreds of false detections.

I can set it to 20 days for both? Users can customise it through the config file.

image

dvida commented 5 months ago

Thanks for the thorough and supported answer - 20 days it is then! We'll try to make sure that the operational pipeline is never down for more than 20 days no matter what. Would you mind sending a follow-up to the group explaining the 20 days extension and the implications of this limit?

markmac99 commented 5 months ago

yep, sure , will do !

markmac99 commented 5 months ago

ps - I've updated the code in line with this discussion :)

dvida commented 5 months ago

And one final thing to check - an operator recently had an issue that the upload was failing because a bz2 file got deleted before it got a chance to be uploaded. Can you check what happens in that case? I think there should be a way to communicate with the UploadManager that the file won't get uploaded.

markmac99 commented 5 months ago

Good catch. I've amended uploadData to handle this using the same logic as in loadQueue(), which skips nonexistent files.