Closed terrywbrady closed 4 years ago
The storage problems are caused by presign object requests. 90% of the temp space used by the storage servers were either in progress or failed archive directories. In one case a single object used >358G / 1T. One problem of this process is that any directory will use about 2 x the resulting zip file because all components need to be staged before incorporated into the resulting container file (e.g. zip)
Eventually a queuing system for presign object would be useful for throttling the process and keeping simultaneous zip down to prevent temp locking problems.
As a short term solution, I'm proposing that there be 4 storage servers in production, 2 would be specifically dedicated to ingest and 2 would be dedicated to archive creation by UI (either presign or direct).
I view ingest storage handling as having a higher priority than archive creation. This would guarantee that archive handling would not interfere in the ingest processing.
@elopatin-uc3 , I think we can close this.
Yes, agreed @terrywbrady; closing