Open Arnaud-Nauwynck opened 1 day ago
see also related issue: https://github.com/apache/parquet-java/issues/3077 : AzureBlobFileSystem.open() should return a sub-class fsDataInputStream that override readVectored() much more efficiently for small reads
@mukund-thakur
I think we should revisit that min seek range, at least add a good default for ABFS, and make it configurable. File a HADOOP- JIRA, ideally with a patch.
We do always have hopes for a full implementation for abfs; though it has yet to surface.
On hadoop 3.4.1 the abfs connector also support the httpclient connection pool for better performance, though that is stabilising.
Anyway, I agree with you about read and discard is better for cloud stores, what I don't know is what is a good value here
What do you think, at least in your deployments?
The velox paper actually set the value as 500K
IO reads for nearby columns are typically coalesced (merged) if the gap between them is small enough (currently about 20K for SSD and 500K for disaggregated storage)
Maybe in hadoop we should
Describe the enhancement requested
When reading some column chunks but not all, parquet is building a list of "ConsecutivePartList", then trying to call the Hadoop api for vectorized reader of FSDataInputStream#readVectored(List ...)
Unfortunatly, many implementations of "FSDataInputStream" do not override the readVectored() method, which trigger many distinct calls to read.
For example on hadoop-azure, the Azure Datalake Storage is much slower at establishing a new Https connection (using infamous calls HttpURLConnection for jdk 1.0, then doing TLS hand-shake), that to get only few more megas of data on an existing socket !!
The case with small wholes to avoid reading is very frequent when having columns in parquet files that are not read, and are highly compressed because of RLE encoding. Typically, a very sparse column with only few values, or even always null within a page. Such a column could be encoded in only few hundred of bytes by parquet, so it is NOT a problem of reading 100 bytes more.
Parquet should at least honor the following method from hadoop class FileSystem, that says that a seek of less than 4096 bytes is NOT reasonable.
The logic for building this List for a list of column chunks is here:
org.apache.parquet.hadoop.ParquetFileReader#internalReadRowGroup
maybe a possible implementation could be to add fictive "ConsecutivePartList" that are to be ignored while receiving the data, but that would avoid having some wholes in the ranges to read.
Component(s)
No response