Open asfimport opened 3 years ago
Gabor Szadovszky / @gszadovszky:
@ggershinsky, however the original topic of this jira is invalid we still need to add proper comments to RowGroup.file_offset
describing the situation of PARQUET-2078 and helping the implementations to handle the potential wrong value. Would you like to handle this?
Gidon Gershinsky / @ggershinsky: @gszadovszky yes, I'll take it. There might be a different solution (also format-related) that bypasses the need to calculate such parameter in any implementation, so it can be fully deprecated. I'll get back with the details and we'll discuss the trade-offs.
Gidon Gershinsky / @ggershinsky: Hi @gszadovszky , I've prepared a short writeup on this alternative solution, with a discussion of the tradeoffs. After writing it, my feeling is that the trade-off is not in favor of this alternative option; but here it goes, just to cover all bases. Will appreciate your opinion on this.
Gabor Szadovszky / @gszadovszky: @ggershinsky, could you make the doc available for comments?
Gidon Gershinsky / @ggershinsky: Oh, sorry, done.
Due to PARQUET-2078 RowGroup.file_offset is not reliable.
This field is also wrongly calculated in the C++ oss parquet implementation PARQUET-2089
Reporter: Gabor Szadovszky / @gszadovszky Assignee: Gidon Gershinsky / @ggershinsky
Note: This issue was originally created as PARQUET-2080. Please see the migration documentation for further details.