Closed matthewmturner closed 2 years ago
@seddonm1 i was thinking of adding tests for the above. Do you think this makes sense? I dont have enough experience with implementing object store readers to know if there could file type / partition specific issues that we need to look out for.
I think the bad data we should test for either way though.
@houqp also curious if you have any thoughts on ways we could improve testing.
Yes, testing for bad data makes sense. The call to the API is pretty simple and mainly in the AWS SDK so I wouldn't expect too many interesting things.
I would just advise against adding too much testing for specific file types as they may change behaviour upstream (DataFusion/Arrow) which means a lot of maintenance work as the APIs become more stable.
@seddonm1 ive trimmed down the list. let me know if you think anything missing
Add testing for the below cases.
Bad Data
DataFusion Integration
ctx.register_object_store