WebCuratorTool / webcurator

The root of the webcurator tool project, containing all modules needed to run a fully functional webcurator tool.
Apache License 2.0
2 stars 1 forks source link

Move to individual collections for Target Instances in Pywb #97

Open obrienben opened 11 months ago

obrienben commented 11 months ago

Add an alternate integration with Pywb, to create individual collections for Target Instance - Harvest Results. This would either replace the default single collection use with Pywb or be an optional feature that could be enabled/disabled.

Upon indexing each Harvest Result, WCT Store would create a new collection in Pywb, defined by the Target Instance ID and Harvest Result number, \<TI-ID>-\<HR-ID>, e.g. "10078623-1". The "Review in Access Tool" URL would then need to include the generated collection ID, e.g. "wayback.wct-host/10078623-1/2023110278364/https://a-website.com/"

This would provide two main benefits

  1. Allow discreet QA of Harvest Results, with no risk of the viewer using content from other Target Instances or Harvest Results in WCT.
  2. Make it easier to cleanup archived and rejected Target Instances (and just old ones too), by just deleting the Pywb collection directory for that Target Instance.

Also, if a Harvest Result is not indexed, in place of the "Review in Access Tool" link, give an option to index the harvest. Then if an older Harvest Result is cleaned up by WCT but is needed again by a user at a later date, they can re-index and view the harvest.

obrienben commented 7 months ago

Requirements