Open sunank200 opened 1 year ago
Currently, if there are multiple files in a folder each file is loaded into memory
Yeah, entire file shouldn't be loaded in the memory. It can be one of the options but not the only option.
Flow (from fastest path to slowest):
Describe the bug A clear and concise description of what the bug is. I tried an 11 GB file (zip file of 11 GB) from S3 to GCS on a worker of 500 Mb and it got killed because of memory:
Expected behavior The read method should only load chunks into memory. Currently, if there are multiple files in a folder each file is loaded into memory. But for scenarios when a single file is very large, we should have a logic to load only chunks at once.