Design and Implement the Origins Archive

richard-jones commented 3 years ago

[x] Create an origins archive module that provides a simple API for sending items to the archive and responds with a unique ID
[x] Implement a mechanism for taking those items and storing them somewhere safe (e.g. S3)
[ ] Implement a retention policy system (probably running on a scheduler)
[x] Provide a basic but convenient method of retrieving (or at least identifying the location of) archived content. The origins archive may want to keep a local copy of details about archived content, but it should also be able to function without it/rebuild from the remote store.

richard-jones commented 2 years ago

We should add this as a Kafka topic

richard-jones commented 2 years ago

I have implemented this as the first step of the pipeline processor, written over an abstract storage interface that can connect to the local disk or to Amazon S3.

I have pushed this to develop but I have not yet wired it into the live processing pipeline as we need to consider the testing storage implementation a bit more carefully.

I propose to add a more advanced local storage layer which can persist the files to ZIP instead of as individual files, which should make this scale better in the local store mode. For real implementations the service provider will need to use S3 or provide a storage implementation which persists to their preferred object store. All that can be configured in.

richard-jones commented 2 years ago

Tagging @Steven-Eardley into this one as I need to discuss how it fits into the orchestration and deployment

NGLPteam / NGLP-Analytics

Design and Implement the Origins Archive #26