Open alexprykhodko opened 1 year ago
It sounds like this is an implementation incongruency between our different IO managers - I wonder if we could implement the ingestion in such a way that it only needed to be written once for all IO managers, similar to our path construction machinery.
@dpeng817 I ended up doing something similar in my version of IO managers, after creating a base S3 IO class – the derived classes simply implement read_from_path()
and write_to_path()
:
class S3MsgPackIOManager(BaseS3IOManager):
def __init__(self, s3_bucket, s3_prefix=None, scheme='s3', use_compression=False):
super().__init__(s3_bucket, s3_prefix, scheme)
self._use_compression = use_compression
def _write_to_path(self, path: UPath, obj):
with path.open('wb') as f_out:
if self._use_compression:
zst = zstandard.ZstdCompressor(level=5)
writer = zst.stream_writer(f_out)
else:
writer = f_out
msgpack.dump(obj, writer)
writer.close()
def _read_from_path(self, path: UPath):
result = None
with path.open('rb') as f_in:
if self._use_compression:
zst = zstandard.ZstdDecompressor()
reader = zst.stream_reader(f_in)
else:
reader = f_in
result = msgpack.load(reader)
reader.close()
return result
def _get_file_extension(self):
return 'msgpack' if not self._use_compression else 'msgpack.zst'
Dagster version
dagster, version 1.1.5
What's the issue?
Given the following assets:
when using
s3_pickle_io_manager
, the following error is reported at the job start:The error is not thrown when
fs_io_manager
is used.What did you expect to happen?
The
downstream_asset
should receive the argumentupstream_asset
that contains the dictionary with the keys being all of the partitions of theupstream_asset
.How to reproduce?
See the section above.
Deployment type
Local
Deployment details
Python 3.8.13 macOS 13.0
Additional information
No response
Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.