Open andygrove opened 3 years ago
@andygrove as the client is handling the logical plan, I think it does not need to know about the list of files or the statistics, it only needs the schema:
As flight already has an endpoint to query the schema, this would avoid creating and maintaining a new one 😃
Hi @andygrove, we have integrated ballista with HDFS support. Our workaround is to make the file path self described. For example, a local file path should be file://tmp/..., a hdfs file path should hdfs://localhost:xxx:/tmp/...
To make it work, we also changed the object store api a bit. Later I'll create a PR for this.
Later I'll create a PR for this.
@yahoNanJing this intersects work I'm currently working on, so anything you could share would be helpful!
Is your feature request related to a problem or challenge? Please describe what you are trying to do. I have a Ballista cluster running, and each scheduler and executor has access to TPC-H data locally. I am running the benchmark client on my desktop, and I do not have access to the data locally. Query planning fails with "file not found" because
BallistaContext::read_parquet
is looking for the file on the local file system when it should be getting the file metadata from a scheduler in the cluster.Describe the solution you'd like The context should send a gRPC request to the scheduler to get the necessary metadata.
Describe alternatives you've considered None
Additional context None