Open chitralverma opened 1 year ago
Hey @chitralverma , I'm in favor of improving the Java dataset APIs to provide similar functionality as pyarrow. They are both bindings to the C++ implementation so should be able to provide the same functionality.
Please let me know if the above make sense, I can help with PRs for the same.
Thank you, I look forward to your contributions!
Describe the enhancement requested
Some important changes are suggested in the list below to improve the developer experience with the Dataset API of java/arrow. Most of these suggestions if implemented will lead to consistency with the pyarrow dataset API.
.inspect()
and this is not documented anywhere. This behaviour is the same in pyarrow. Maybe it's a good idea to allow users to provide a strategy like Error, Merge, LastFile etc.FileSystemDatasetFactory.inspect()
andFileSystemDatasetFactory.finish().newScan(...).schema()
. Which one to use in which case?Please let me know if the above make sense, I can help with PRs for the same.
Component(s)
Java