Open Thimm opened 10 months ago
@Thimm Thank you for reporting this issue. It will be resolved in the next release. There has been a bug that was causing retiling of a large file to happen at a deferred stage and not immediately on read. Spark buffers do not support binaries > 2GB so on read we have to retile the file to tiles that are < 2gb and then perform transformations on those. I will be opening a PR today and this will be a part of the next release. I ran the provided file on my local machine with the new fix without any issues using a docker and rosetta tanslation since I am on mac M1 - even with those constraints it runs now. The next release should be out within a couple of weeks.
Describe the bug I am encountering an OutOfMemoryError when attempting to read a large Cloud Optimized GeoTIFF (COG) file (2.4GB in size) using the mosaic.read() method in an Azure Databricks environment. The error occurs during the execution of df.show() after reading the file.
To Reproduce
dbfs
Expected behavior The expectation is to successfully read the COG file into a DataFrame and display it using df.show() without encountering memory issues.
Additional Context
Environment Databricks Runtime Version: 3.3 LTS (includes Apache Spark 3.4.1, Scala 2.12) Cluster Configuration: Standard_D32ads_v5 128 GB Memory, 32 Cores Language: Python
Traceback.txt