NGLPteam / NGLP-Analytics

MIT License
0 stars 1 forks source link

Design and Implement the Origins Archive #26

Open richard-jones opened 3 years ago

richard-jones commented 3 years ago
richard-jones commented 2 years ago

We should add this as a Kafka topic

richard-jones commented 2 years ago

I have implemented this as the first step of the pipeline processor, written over an abstract storage interface that can connect to the local disk or to Amazon S3.

I have pushed this to develop but I have not yet wired it into the live processing pipeline as we need to consider the testing storage implementation a bit more carefully.

I propose to add a more advanced local storage layer which can persist the files to ZIP instead of as individual files, which should make this scale better in the local store mode. For real implementations the service provider will need to use S3 or provide a storage implementation which persists to their preferred object store. All that can be configured in.

richard-jones commented 2 years ago

Tagging @Steven-Eardley into this one as I need to discuss how it fits into the orchestration and deployment