Closed devinrsmith closed 1 month ago
The issue is with the parquet table itself.
print(my_table.j_table.getRowSet())
{0--1}
But it should be empty and displayed as {}
.
At a minimum, we should be able to assert that all parquet tables are "flat" - I believe that would have caught this bad row set. In addition, we might ask if the rowset code itself should catch this sort of bad rowset.
At a minimum, we should be able to assert that all parquet tables are "flat" - I believe that would have caught this bad row set. In addition, we might ask if the rowset code itself should catch this sort of bad rowset.
We actually can't assert that. The RowSet is an artifact of how our code responds to the arrangement of row groups and their sizes. We will not produce a flat RowSet if a Table is backed by more than one Parquet file or more than one row group.
The linked parquet file has an empty row group, and we have a known issue that our engine does not support parquet files with empty row groups (#5530).
This does manifest differently though, without an explicit error.
Yup, the error is different and kind of gets hidden but the root cause is the same. I have added a fix for it in #6183
An "empty" parquet file, created via pyarrow, seems to be leading to Barrage writing issues.
This manifests itself as a "waiting for viewport" (seemingly with 2 rows) in the web UI
A Flight DoGet looks correct:
A Barrage DoExchange looks incorrect:
It's possible that the Parquet Table implementation is incorrect in someway (and thus, leads to the Barrage issue). There is special handling on the DH side around empty row groups, and that may be leading to issues?
Here is the file Empty1.parquet.txt, note the .txt was added to make it uploadable to Github. It was generated with the following snippet:
Here is a quick snippet of the data as viewed through DuckDB: