Closed servilla closed 1 year ago
Successful completion of this issue will resolve #78 since cached versions of the zip archive file will no longer be required.
An ensuing discussion on this issue led to options of either addressing this in the existing Java code base (e.g., Data Package Manager service) or using a Python web framework. This particular service call can be easily implemented in Python since it can be accomplished independently of any other Java classes. We ultimately decided to stay within the Java code base for the following reasons:
When a user requests a zip archive file, the current processing approach is first to check if the zip file exists in a cache and then, if it does, to begin streaming it. If the zip archive does not exist, the first step is to create the zip archive file and then begin streaming it. This means that the user of the first request pays the price of a long wait while the zip file is created. This is not critical for small volumes of data, but multiple GBs may result in a time-out for that first request. In addition, the cached zip archive files require additional disk storage.
For these reasons, we should refactor the workflow from storing cached versions of the zip archive file to one where the zip archive is dynamically created and streamed back to the user in real time. We assume this will incur a small overhead in the dynamic compression but do not believe it will be humanly noticeable.