crkn-rcdr / Digital-Preservation

Documentation and related schemas for the CRKN digital preservation system
3 stars 0 forks source link

Tool to remove unused virtual folders in repoanalysis Swift container. #11

Closed RussellMcOrmond closed 4 years ago

RussellMcOrmond commented 5 years ago

As discussed in #9, it was discovered that Swift was inefficient in storing a large number of small documents. A tool was written to group JHOVE reports into a single zip file. As these zip files are being created, the now unused virtual folders of individual JHOVE reports can be deleted.

A tool to do this delete needs to be created.

RussellMcOrmond commented 4 years ago

Running of this tool is ongoing. Deleting unused objects seems to take longer than putting the files there in the first place. If I run many at the same time the Swift "Pending Container Updates" increases, and files aren't actually removed until later.

RussellMcOrmond commented 4 years ago

While not all container updates are complete, the tool runs without finding any directories to delete. Closing unless a problem is noticed.

russell@jarlsberg:~/git/Digital-Preservation/RepositoryAnalysis/tools$ date ; docker-compose run ratools bash -c "cleanup" ; date
Tue Mar 24 12:23:17 EDT 2020
Container get: Start
Container get: numeris.TV_1989_WI00_016.zip
Container get: oocihm.08099.zip
Container get: oocihm.19004.zip
Container get: oocihm.33690.zip
Container get: oocihm.45804.zip
Container get: oocihm.57464.zip
Container get: oocihm.71815.zip
Container get: oocihm.82032.zip
Container get: oocihm.8_04023_104.zip
Container get: oocihm.8_04199_10.zip
Container get: oocihm.8_04473_162.zip
Container get: oocihm.8_04729_696.zip
Container get: oocihm.8_04926_63.zip
Container get: oocihm.8_05011_12.zip
Container get: oocihm.8_06019_180.zip
Container get: oocihm.8_06240_143.zip
Container get: oocihm.8_06481_30.zip
Container get: oocihm.8_06550_308.zip
Container get: oocihm.8_06663_62.zip
Container get: oocihm.8_06914_118.zip
Container get: oocihm.91990.zip
Container get: oocihm.9_01864.zip
Container get: oocihm.N_00006_19160819.zip
Container get: oocihm.N_00103_18761101.zip
Container get: oocihm.N_00208_19140731.zip
Container get: oocihm.lac_reel_c12982.zip
Container get: oocihm.lac_reel_t1021.zip
Container get: oocihm.lac_reel_t7031.zip
Container get: ooe.b237030x.zip
Container get: ooe.b4330304_059.zip
Container get: ooe.sas_19851030FP.zip
Container get: oop.com_HOC_2902_13_1.zip
Container get: qmma.McGillAC_98.zip
Tue Mar 24 12:26:40 EDT 2020
russell@jarlsberg:~/git/Digital-Preservation/RepositoryAnalysis/tools$