Open zhicwu opened 6 months ago
Would be great if Databend can still query parquet file without knowing its size in advance.
Currently, select from uri depends on the content length response.
2 choices:
pub fn content_length(&self) -> u64 {
debug_assert!(
self.metakey.contains(Metakey::ContentLength)
|| self.metakey.contains(Metakey::Complete),
"visiting not set metadata: content_length, maybe a bug"
);
self.content_length.unwrap_or_default()
}
We can't support reading parquet without knowing it's length since we should read from the end to get it's metadata.
We can't support reading parquet without knowing it's length since we should read from the end to get it's metadata.
yes, although we can read the whole file into mem first
but maybe we do not want in copy,and require some changes.
and for querying stage,currently need to read schema for binding
Search before asking
Version
nightly
What's Wrong?
select * from 'https://domain.name/test.parquet'
ended up with below error. The same query works well on both DuckDB and ClickHouse.How to Reproduce?
Issue query
select * from 'https://domain.name/test.parquet'
using latest JDBC driver against nightly build. Make sure the web server only respond 200(without header likeContent-Length
) to HEAD requests:FYI, here
https://domain.name/test.parquet
is NOT a static file. The content is generated for each GET request backed by a short-lived cache. Would be great if Databend can still query parquet file without knowing its size in advance.Are you willing to submit PR?