Closed atifiu closed 1 week ago
@atifiu Do you have a program that can reproduce the issue?
@huaxingao No. Because this is happening only with large table 4TB+. For smaller tables it's working fine. Even on large table if I place filter it works.
There should be a wrapped exception and explains what exactly happened in the parallel task.
@paulpaul1076 I have now pasted the complete error stack.
Seems it caused by OOM
@huaxingao I am unable to understand why OOM when not filtering the data but when filtering the data with selecting all the partitions without skipping anything works fine. Ideally in both cases we are referring to same amount of data.
@atifiu have you tried giving your application more memory? How much memory are you specifying in spark submit?
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'
Apache Iceberg version
1.3.0
Query engine
Spark
Please describe the bug 🐞
Below query on Iceberg table fails:
spark.sql(f""" select max(visitor_id) from schema.table1 """).show()
But if we put a filter whether selecting all the partitions or single partition, query executes successfully.
@huaxingao @RussellSpitzer In case you can give some feedback on the same.
Below is the error log for the same: