Closed asfimport closed 3 years ago
Nemon Lou / @loudongfeng: Root cause analyze: The file offset added in parquet 1.12.0 can go wrong under certain conditions. https://github.com/apache/parquet-mr/blob/apache-parquet-1.12.0/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java#L580 When using ParquetRecordReader with blocks passed in, the wrongly setted file offset causes filtering out some blocks due to offset missmatch. https://github.com/apache/parquet-mr/blob/apache-parquet-1.12.0/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java#L1264
When file offset can go wrong? Here are writter debug logs to make it clearfy.
testing : currentChunkFirstDataPage offset 4 testing : currentChunkFirstDataPage offset 12243647 testing : currentChunkFirstDataPage offset 42848491 testing : currentChunkDictionaryPageOffset offset 54810911 testing : currentChunkFirstDataPage offset 54868535 testing : currentChunkFirstDataPage offset 57421932 testing : currentChunkDictionaryPageOffset offset 69665577 testing : currentChunkFirstDataPage offset 69694809 testing : currentChunkDictionaryPageOffset offset 72063808 testing : currentChunkFirstDataPage offset 72093040 testing : currentChunkDictionaryPageOffset offset 74461441 testing : currentChunkFirstDataPage offset 74461508 testing : currentChunkDictionaryPageOffset offset 75041119 testing : currentChunkFirstDataPage offset 75092758 testing : currentChunkDictionaryPageOffset offset 77575161 testing : currentChunkFirstDataPage offset 77626525 testing : currentChunkDictionaryPageOffset offset 80116424 testing : currentChunkFirstDataPage offset 80116456 testing : currentChunkDictionaryPageOffset offset 80505206 testing : currentChunkFirstDataPage offset 80505351 testing : currentChunkDictionaryPageOffset offset 81581705 testing : currentChunkFirstDataPage offset 81581772 testing : currentChunkDictionaryPageOffset offset 82473442 testing : currentChunkFirstDataPage offset 82473740 testing : currentChunkDictionaryPageOffset offset 83918856 testing : currentChunkFirstDataPage offset 83921564 testing : currentChunkDictionaryPageOffset offset 85457651 testing : currentChunkFirstDataPage offset 85457674 testing : currentChunkFirstDataPage offset 85460523 testing : currentChunkDictionaryPageOffset offset 132143159 testing : currentChunkFirstDataPage offset 132146109 testing :block offset: 4 testing : currentChunkFirstDataPage offset 133961161 testing : currentChunkFirstDataPage offset 144992321 testing : currentChunkFirstDataPage offset 172566390 testing : currentChunkDictionaryPageOffset offset 183343431 testing : currentChunkFirstDataPage offset 183401055 testing : currentChunkFirstDataPage offset 185701717 testing : currentChunkDictionaryPageOffset offset 196732869 testing : currentChunkFirstDataPage offset 196762101 testing : currentChunkDictionaryPageOffset offset 198896490 testing : currentChunkFirstDataPage offset 198925722 testing : currentChunkDictionaryPageOffset offset 201059822 testing : currentChunkFirstDataPage offset 201059889 testing : currentChunkDictionaryPageOffset offset 201582088 testing : currentChunkFirstDataPage offset 201633695 testing : currentChunkDictionaryPageOffset offset 203869258 testing : currentChunkFirstDataPage offset 203920622 testing : currentChunkDictionaryPageOffset offset 206163685 testing : currentChunkFirstDataPage offset 206163718 testing : currentChunkDictionaryPageOffset offset 206513919 testing : currentChunkFirstDataPage offset 206514064 testing : currentChunkDictionaryPageOffset offset 207484483 testing : currentChunkFirstDataPage offset 207484550 testing : currentChunkDictionaryPageOffset offset 208288402 testing : currentChunkFirstDataPage offset 208288700 testing : currentChunkDictionaryPageOffset offset 209591541 testing : currentChunkFirstDataPage offset 209594249 testing : currentChunkDictionaryPageOffset offset 210978198 testing : currentChunkFirstDataPage offset 210978221 testing : currentChunkFirstDataPage offset 210980774 testing : currentChunkDictionaryPageOffset offset 253052539 testing : currentChunkFirstDataPage offset 253055489 testing :block offset: 133961161 testing : set File_offset for rowgroup. with position: 4 testing : set File_offset for rowgroup. with position: 132143159
Notice that the second file offset 132143159 is wrong(133961161 is expected), which is the last column's ChunkDictionaryPageOffset in the first rowgroup.
Gabor Szadovszky / @gszadovszky:
@loudongfeng, thanks a lot for the investigation. What is not clear to me how it could happen that we set the wrong value to RowGroup.file_offset
. Based on the code in ParquetMetadataConverter we use the starting position of the first column chunk of the actual row group. The starting position of the column chunk is the dictionary page offset or the first data page offset, whatever is the smaller (because dictionary page is always at the starting position of the column chunk.) If the dictionary page offset or the first data page offset would be wrong we should have other issues as well. Can you read the file content without using InputSplits (e.g. parquet-tools, parquet-cli or java code that reads the whole file)? There is a new parquet-cli tool called footer that can list the raw footer of the file. It would be interesting to see the output of it on the related parquet file. Unfortunately, this feature is not released yet so it have to be built from master. If you are interested to do so please check the readme for details.
If you are right and we write invalid offsets to the file since 1.12.0 that it is a serious issue. We not only have to fix the writing path but the reading as well since we will have files already written by 1.12.0.
Nemon Lou / @loudongfeng: @gszadovszky, Thanks for your attention.
RowGroup.file_offset can go wrong when all of the following conditions are met
3, The writer writes more than one row goups.
The first column of the second row group will reuse other column's currentChunkDictionaryPageOffset from the first row group,
so block.getStaringPos() will go wrong,
which takes currentChunkDictionaryPageOffset into calculation.
Gabor Szadovszky / @gszadovszky: @loudongfeng, thanks a lot for the detailed explanation and the patch! So what I have written before stands. Before 1.12.0 we did not write the dictionary offset to the column chunk metadata (see PARQUET-1850) even though the calculation was wrong since the beginning. Since we released 1.12.0 already it means we have to prepare for the invalid dictionary offset values.
What we need to handle in a fix:
Investigate all code parts where the dictionary offset and file offset are used and prepare for invalid values
@loudongfeng, would you like to work on this by opening a PR on github?
Gabor Szadovszky / @gszadovszky: Added the dev list thread link here to keep both sides in the loop.
Nemon Lou / @loudongfeng: Meta data dumping from footer shows that dictionary page offset is right.Only RowGroup.file_offset is wrong in the file, i think.
java -jar parquet-tools-deprecated-1.12.0.jar meta ~/data/customer1/000000_0
file: file:/home/nemon/data/customer1/000000_0
creator: parquet-mr version 1.12.0 (build b7c9d0beddc1052004370eebe944e22f55a7d508)
extra: writer.date.proleptic = false
extra: writer.time.zone = Asia/Shanghai
extra: writer.model.name = 4.0.0-SNAPSHOT
extra: writer.zone.conversion.legacy = false
file schema: hive_schema
--------------------------------------------------------------------------------
c_customer_sk: OPTIONAL INT64 R:0 D:1
c_customer_id: OPTIONAL BINARY L:STRING R:0 D:1
c_current_cdemo_sk: OPTIONAL INT64 R:0 D:1
c_current_hdemo_sk: OPTIONAL INT64 R:0 D:1
c_current_addr_sk: OPTIONAL INT64 R:0 D:1
c_first_shipto_date_sk: OPTIONAL INT64 R:0 D:1
c_first_sales_date_sk: OPTIONAL INT64 R:0 D:1
c_salutation: OPTIONAL BINARY L:STRING R:0 D:1
c_first_name: OPTIONAL BINARY L:STRING R:0 D:1
c_last_name: OPTIONAL BINARY L:STRING R:0 D:1
c_preferred_cust_flag: OPTIONAL BINARY L:STRING R:0 D:1
c_birth_day: OPTIONAL INT32 R:0 D:1
c_birth_month: OPTIONAL INT32 R:0 D:1
c_birth_year: OPTIONAL INT32 R:0 D:1
c_birth_country: OPTIONAL BINARY L:STRING R:0 D:1
c_login: OPTIONAL BINARY L:STRING R:0 D:1
c_email_address: OPTIONAL BINARY L:STRING R:0 D:1
c_last_review_date_sk: OPTIONAL INT64 R:0 D:1
row group 1: RC:1530100 TS:133961157 OFFSET:4
--------------------------------------------------------------------------------
c_customer_sk: INT64 UNCOMPRESSED DO:0 FPO:4 SZ:12243643/12243643/1.00 VC:1530100 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 6, max: 64999980, num_nulls: 0]
c_customer_id: BINARY UNCOMPRESSED DO:0 FPO:12243647 SZ:30604844/30604844/1.00 VC:1530100 ENC:RLE,BIT_PACKED,PLAIN ST:[min: AAAAAAAAAAAABDAA, max: AAAAAAAAPPPPKCBA, num_nulls: 0]
c_current_cdemo_sk: INT64 UNCOMPRESSED DO:0 FPO:42848491 SZ:11962420/11962420/1.00 VC:1530100 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 1, max: 1920799, num_nulls: 53592]
c_current_hdemo_sk: INT64 UNCOMPRESSED DO:54810911 FPO:54868535 SZ:2611021/2611021/1.00 VC:1530100 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[min: 1, max: 7200, num_nulls: 53561]
c_current_addr_sk: INT64 UNCOMPRESSED DO:0 FPO:57421932 SZ:12243645/12243645/1.00 VC:1530100 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 9, max: 32499985, num_nulls: 0]
c_first_shipto_date_sk: INT64 UNCOMPRESSED DO:69665577 FPO:69694809 SZ:2398231/2398231/1.00 VC:1530100 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[min: 2449028, max: 2452678, num_nulls: 53457]
c_first_sales_date_sk: INT64 UNCOMPRESSED DO:72063808 FPO:72093040 SZ:2397633/2397633/1.00 VC:1530100 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[min: 2448998, max: 2452648, num_nulls: 53502]
c_salutation: BINARY UNCOMPRESSED DO:74461441 FPO:74461508 SZ:579678/579678/1.00 VC:1530100 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[min: , max: Sir, num_nulls: 0]
c_first_name: BINARY UNCOMPRESSED DO:75041119 FPO:75092758 SZ:2534042/2534042/1.00 VC:1530100 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[min: , max: Zulma, num_nulls: 0]
c_last_name: BINARY UNCOMPRESSED DO:77575161 FPO:77626525 SZ:2541263/2541263/1.00 VC:1530100 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[min: , max: Zuniga, num_nulls: 0]
c_preferred_cust_flag: BINARY UNCOMPRESSED DO:80116424 FPO:80116456 SZ:388782/388782/1.00 VC:1530100 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[min: , max: Y, num_nulls: 0]
c_birth_day: INT32 UNCOMPRESSED DO:80505206 FPO:80505351 SZ:1076499/1076499/1.00 VC:1530100 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[min: 1, max: 31, num_nulls: 53591]
c_birth_month: INT32 UNCOMPRESSED DO:81581705 FPO:81581772 SZ:891737/891737/1.00 VC:1530100 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[min: 1, max: 12, num_nulls: 53288]
c_birth_year: INT32 UNCOMPRESSED DO:82473442 FPO:82473740 SZ:1445414/1445414/1.00 VC:1530100 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[min: 1924, max: 1992, num_nulls: 53375]
c_birth_country: BINARY UNCOMPRESSED DO:83918856 FPO:83921564 SZ:1538795/1538795/1.00 VC:1530100 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[min: , max: ZIMBABWE, num_nulls: 0]
c_login: BINARY UNCOMPRESSED DO:85457651 FPO:85457674 SZ:2872/2872/1.00 VC:1530100 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[min: , max: , num_nulls: 0]
c_email_address: BINARY UNCOMPRESSED DO:0 FPO:85460523 SZ:46682636/46682636/1.00 VC:1530100 ENC:RLE,BIT_PACKED,PLAIN ST:[min: , max: Zulma.Weaver@SjkMJMi7XYbPsIMxT.org, num_nulls: 0]
c_last_review_date_sk: INT64 UNCOMPRESSED DO:132143159 FPO:132146109 SZ:1818002/1818002/1.00 VC:1530100 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[min: 2452283, max: 2452648, num_nulls: 53812]
row group 2: RC:1378576 TS:120729344 OFFSET:133961161
--------------------------------------------------------------------------------
c_customer_sk: INT64 UNCOMPRESSED DO:0 FPO:133961161 SZ:11031160/11031160/1.00 VC:1378576 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 353, max: 64999947, num_nulls: 0]
c_customer_id: BINARY UNCOMPRESSED DO:0 FPO:144992321 SZ:27574069/27574069/1.00 VC:1378576 ENC:RLE,BIT_PACKED,PLAIN ST:[min: AAAAAAAAAAAABAAA, max: AAAAAAAAPPPPPJCA, num_nulls: 0]
c_current_cdemo_sk: INT64 UNCOMPRESSED DO:0 FPO:172566390 SZ:10777041/10777041/1.00 VC:1378576 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 3, max: 1920799, num_nulls: 48426]
c_current_hdemo_sk: INT64 UNCOMPRESSED DO:183343431 FPO:183401055 SZ:2358286/2358286/1.00 VC:1378576 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[min: 1, max: 7200, num_nulls: 48252]
c_current_addr_sk: INT64 UNCOMPRESSED DO:0 FPO:185701717 SZ:11031152/11031152/1.00 VC:1378576 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 13, max: 32499975, num_nulls: 0]
c_first_shipto_date_sk: INT64 UNCOMPRESSED DO:196732869 FPO:196762101 SZ:2163621/2163621/1.00 VC:1378576 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[min: 2449028, max: 2452678, num_nulls: 48435]
c_first_sales_date_sk: INT64 UNCOMPRESSED DO:198896490 FPO:198925722 SZ:2163332/2163332/1.00 VC:1378576 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[min: 2448998, max: 2452648, num_nulls: 48508]
c_salutation: BINARY UNCOMPRESSED DO:201059822 FPO:201059889 SZ:522266/522266/1.00 VC:1378576 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[min: , max: Sir, num_nulls: 0]
c_first_name: BINARY UNCOMPRESSED DO:201582088 FPO:201633695 SZ:2287170/2287170/1.00 VC:1378576 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[min: , max: Zulma, num_nulls: 0]
c_last_name: BINARY UNCOMPRESSED DO:203869258 FPO:203920622 SZ:2294427/2294427/1.00 VC:1378576 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[min: , max: Zuniga, num_nulls: 0]
c_preferred_cust_flag: BINARY UNCOMPRESSED DO:206163685 FPO:206163718 SZ:350234/350234/1.00 VC:1378576 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[min: , max: Y, num_nulls: 0]
c_birth_day: INT32 UNCOMPRESSED DO:206513919 FPO:206514064 SZ:970564/970564/1.00 VC:1378576 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[min: 1, max: 31, num_nulls: 48659]
c_birth_month: INT32 UNCOMPRESSED DO:207484483 FPO:207484550 SZ:803919/803919/1.00 VC:1378576 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[min: 1, max: 12, num_nulls: 48294]
c_birth_year: INT32 UNCOMPRESSED DO:208288402 FPO:208288700 SZ:1303139/1303139/1.00 VC:1378576 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[min: 1924, max: 1992, num_nulls: 48458]
c_birth_country: BINARY UNCOMPRESSED DO:209591541 FPO:209594249 SZ:1386657/1386657/1.00 VC:1378576 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[min: , max: ZIMBABWE, num_nulls: 0]
c_login: BINARY UNCOMPRESSED DO:210978198 FPO:210978221 SZ:2576/2576/1.00 VC:1378576 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[min: , max: , num_nulls: 0]
c_email_address: BINARY UNCOMPRESSED DO:0 FPO:210980774 SZ:42071765/42071765/1.00 VC:1378576 ENC:RLE,BIT_PACKED,PLAIN ST:[min: , max: Zulma.Wylie@pJvO8iGRCFK.com, num_nulls: 0]
c_last_review_date_sk: INT64 UNCOMPRESSED DO:253052539 FPO:253055489 SZ:1637966/1637966/1.00 VC:1378576 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[min: 2452283, max: 2452648, num_nulls: 48083]
Gabor Szadovszky / @gszadovszky: @loudongfeng, I am not sure how it would be possible. RowGroup.file_offset is set by using the dictionary page offset of the first column chunk (if there is any):
ColumnChunkMetaData.getStartingPos()
As per my understanding the issue is based on the following to have the wrong offsets of rowGroup ~n~
(where we have k
columns):
columnChunk ~n-1, 1~
(first column chunk of rowGroup ~n-1~
) is dictionary encoded as well as columnChunk ~n-1, k~
columnChunk ~n, 1~
is not dictionary encoded
In this case fileOffset ~n~ = dictionaryOffset ~n, 1~ = dictionaryOffset ~n-1, k~
To discover this issue we should check if a column chunk is dictionary encoded before using the dictionary offset of it. Unfortunately, we have to do the same before using the file offset of a row group, or simply ignore this value and use the offsets of the first column chunk with the check.
Nemon Lou / @loudongfeng: The wrongly setted dictionary page offset does not spread from org.apache.parquet.hadoop.metadata.ColumnChunkMetaData to org.apache.parquet.format.ColumnMetaData due to hasDictionaryPages check here: https://github.com/apache/parquet-mr/blob/apache-parquet-1.12.0/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java#L520
While rowGroup.file_offset still uses ColumnChunkMetaData to get starting point.
The test case added can pass after the patch and fail before patch, even if only modify the write path code.
Gabor Szadovszky / @gszadovszky:
@loudongfeng, you are right, so dictionaryPageOffset
is not impacted. Great news.
After the second look it is not required to have the first column being dictionary encoded before the invalid row group. It is enough that there are dictionary encoded column chunks in the previous row groups and that the first column chunk is not dictionary encoded in the invalid row group. So, @loudongfeng, you also right with your PR.
Gabor Szadovszky / @gszadovszky: Since the PR is merged I am resolving this.
liujingmao: I also encountered the same question, but diff error response
Writing parquet file with version 1.12.0 in Apache Hive, then read that file, returns the following error:
Repoduce Scenario:
TPC-DS table customer, any parquet file witten by 1.12.0 that larger than 128MB(two row groups).
Reporter: Nemon Lou / @loudongfeng Assignee: Nemon Lou / @loudongfeng
Related issues:
Original Issue Attachments:
PRs and other links:
Note: This issue was originally created as PARQUET-2078. Please see the migration documentation for further details.