Open multimeric opened 4 years ago
We are definitely lacking in out of the box tools to solve this problem elegantly. The Handle
types you've bumped into are a bit stale, and come from us expecting this type of pattern to be common.
My recommendation would be to look at using the Resource
and DagsterType
systems to abstract how you want to think about these file resources in your workflows. Using these two systems together should allow you to make a single workflow that can run in various permutations of where the files are sourced (local/remote) and where the computation is happening (local/cloud). Defining your own resources and dagster types will allow you to encode the exact expected behavior you want to achieve.
https://dagster.readthedocs.io/en/stable/sections/learn/guides/dagster_types.html https://dagster.readthedocs.io/en/stable/sections/tutorial/resources.html
2282
This issue was closed. Has there been any other notable progress on using files?
I'm guessing the main change relates to the addition of the IO Manager, added in 0.10.0
. Is there any example code involving the use of files in relation to these managers?
What is the current status of this? Objects like LocalFileManager
are still present in the codebase (https://github.com/dagster-io/dagster/blob/7a8ba5c303b31a6af197177999d49166097711fa/python_modules/dagster/dagster/_core/storage/file_manager.py#L233) but are constructed as resources rather than IOManagers. Still, they seem to do exactly what I need for my use case - some of my op
s construct SQLite databases which I want to pass around, and the IOManager framework doesn't seem to match that very well.
I'm still looking for a way to do this. We output .geojson and something folders containing shp file data. We want to transfer these across assets/ops.
Same needs here. We are required to produce intermediary products to be archived in a given file format (HDF5). It would be great to have this feature -- it's a blocker currently!
@j2bbayle could you use a pattern like this? https://docs.dagster.io/guides/dagster/non-argument-deps#assets-without-io
I'm writing a pipeline which does a lot of file manipulation. So downloading a file, manipulating it, saving it again etc. I also want my workflow to work in the cloud if I scale it up. Also, in my workflow the files are not Materializations, because they're the main inputs and outputs of a solid. I feel that the documentation doesn't really explain how to do this.
There is a
FileHandle
,LocalFileHandle
, andPath
types built in to dagster. Do any of them solve my need for platform-agnostic file storage?Does the following solid for downloading a file make sense?
What we've heard: