Open kaliserichmond opened 2 years ago
👍🏻 for adding this and zstd support.
I fully understand the problem that a single zip file might be easier to manage in certain scenarios. However, I don't fully understand the argument for this being storage costs. The project is supposed to contain only flow code and optionally also custom dependencies. If the users have some custom files that shouldn't be uploaded, they can add those to .prefectignore starting in the next release
@kaliserichmond Not sure which users problems other than storage costs motivated this feature request but I believe that .prefectignore already solves it in a really flexible and simple way
cc @cicdw looping you in for feedback
Unfortunately, the .prefectignore
functionality doesnt really solve the problem for us, and I reckon there are others who may have the same problem as us. I think the argument of storage costs isn't a massive problem, but actually its an issue of orchestration. If we have many flows in a project that we want to deploy, the current design requires one of two options:
We have several environments to deploy our flows to, and we want to deploy a flow in isolation - IE: when we make changes to flow A (and maybe a shared dependency), the code for flow B is unaffected. Ideally we could also roll-back to a previous version of flow A (again, without affecting the code for flow B). These requirements eliminate option (1) because we can introduce changes to the storage that may affect several flows in one deployment.
We have hundreds of files in our flows repository, with a roughly even split of flow definition files and a shared library of tasks/utilities that are used randomly throughout our flow code. Using option (2) we must deploy all of those files to each storage block for each flow - so we're storing the same files over and over again. Storage cost is relatively cheap, but in order for our flows to execute we have to wait for all of those files to be downloaded from the storage before flow can execute. This introduces some problems:
The .prefectignore
file could, in theory, ease this problem by allowing us to select only the files per flow - but in this scenario we would need a prefectignore
file for each deployment - does Prefect support that?
A single zipped file of the flow dependencies alleviates the risks described above since we only need to download a single file from the storage and extract it into the local execution environment. It also allows us to use a single storage block for all flows in our project, while enabling the isolated deployment of each flow.
This issue is stale because it has been open 30 days with no activity. To keep this issue open remove stale label or comment.
This issue is stale because it has been open 30 days with no activity. To keep this issue open remove stale label or comment.
Still valid
@billpalombi should this still be deferred?
I've accepted this but it's low priority relative to the planned enhancements for projects and deployments
Prefect Version
2.x
Describe the proposed behavior
With the new deployment.yaml, users previously were using the FilePackager with the pickle serializer because they are worried about large s3 costs. Extending support for compression formats like .zip or tarballs would be a good replacement for this functionality.
Describe the current behavior
Currently the deployment build CLI command auto-uploads individual files using file system blocks, and submitted runs use the same block to download these raw files. For certain deployments, there can be large quantities and sizes of files that are necessary to storage alongside the flow, and so having a way to compress the full directory before uploading and decompress when downloading would save users on storage costs.
Example Use
No response
Additional context
No response