Describe the bug, including details regarding any error messages, version, and platform.
Sometimes a file is written that is missing the last byte, so it ends in .PAR when it should be .PAR1. This causes EOFException when attempting to read the file.
Describe the bug, including details regarding any error messages, version, and platform.
Sometimes a file is written that is missing the last byte, so it ends in
.PAR
when it should be.PAR1
. This causesEOFException
when attempting to read the file.This might be related - we are seeing this issue only on GCP, not AWS. For GCP we do disk seeks randomly and on AWS we do disk seeks sequentially.
We can rerun a job that writes the corrupt parquet file, and it will succeed the second time, so it seems to be nondeterministic.
This is on version 1.14.3.
Component(s)
No response