WebCuratorTool / webcurator

The root of the webcurator tool project, containing all modules needed to run a fully functional webcurator tool.
Apache License 2.0
1 stars 1 forks source link

implemented the individual collection for each target instances in pywb #128

Open leefrank9527 opened 3 months ago

leefrank9527 commented 3 months ago

Implemented the individual collection feature for PYWB:

  1. Create an individual collection for each harvest. The option is configurable from Store. The individual collection feature is available if the pywb indexer is enabled and the individual collection mode is set as true:
    pywbIndexer.enable=true
    pywbIndexer.individualCollectionMode=true
  2. Refined the link of access tool on quality review according to the individual collection. The configuration item of urlMap need to be patterned as: harvestResourceUrlMapper.urlMap=http://localhost:8090/{$HarvestResult.Collection}/
  3. Allow users to recreate screenshots or reindex manually. A tool was added on the quality review page.
  4. Removed the individual collection if the target instance is archived or the harvest result is rejected.
leefrank9527 commented 3 months ago

Please note that the cases bellow were not tested:

  1. When the screenshot is disabled, the individual collection feature should still work.
  2. Removing index if the target instance is archived or the harvest result is rejected.

Another point to be pay attention is: if the reindexing or removing index operations are applied, the folder of individual collection will be deleted. In case of there are files inside the folder are opened by the other processes (For example: the PYWB is playing back the harvest or is generating the indexes), the folder can not be deleted by JVM (at least not able to find a way to delete it). If it's failed to delete the folder by JVM, then will call the system command "rm -rf " to delete the folder. The system command calling is only impelemented for linux OS. For Windows and the other kind of OS it will be ignored, and if the JVM fails to delete the folder, some errors may happen.

hannakoppelaar commented 2 months ago

Apart from these minor issues, everything looks okay. I also tested the case where the screenshot tool has been disabled, and I verified that the collection directory is being removed (once the purgeDigitalAssetsTrigger.repeatInterval has elapsed) after a harvest has been rejected or archived.

obrienben commented 1 month ago

@hannakoppelaar I've made some changes to this PR