Closed Jefffrey closed 3 months ago
Almost done, just want to optimize this code:
Because it is major slowdown for the zlib
test
https://github.com/datafusion-contrib/datafusion-orc/commit/0405e23a291ead841353a182aab1338bd7b0c8cf
This commit introduces concatenating the vec of recordbatches into single recordbatch for easier comparison.
Had to disable 2 other tests due to some schema issues, but will work on that separately. Closing this issue
Integration tests added by https://github.com/datafusion-contrib/datafusion-orc/pull/65
However we have to compare actual vs expected data in JSON format since that is how it is encoded in the Apache ORC repo
An alternative way could be to use the pyarrow/arrow ORC implementation to generate the expected files into a parquet or arrow flight file format which can be more rigorous than JSON
We lose visibility on the expected data a bit but since these are integration tests with data from Apache ORC repo, they wouldn't change often (if at all) anyway