Open domoritz opened 1 year ago
The issue is that I need to use open_stream
. The error message should be better.
I am not sure if this is related. I had a similar experience when I have mistakenly written files but haven't closed the file writer. In your case since it is loaded in JS properly, this could be an entirely different scenario. But thought it is worth mentioning here.
Small reproducer without having to download a file:
import pyarrow as pa
batch = pa.record_batch([pa.array([1, 2, 3])], ['a'])
# Create an Arrow Stream file
with pa.ipc.new_stream("test.arrows", batch.schema) as writer:
writer.write(batch)
# Read as Arrow File
pa.ipc.open_file("test.arrows")
# -> ... ArrowInvalid: Not an Arrow file
I agree it would be nice we can give a more informative error message and hint the user they are reading a Arrow Streaming format file and not a Arrow File format file.
Similarly, reading a File with a Streaming reader also gives a non-informative error message:
with pa.ipc.new_file("test.arrow", batch.schema) as writer:
writer.write(batch)
pa.ipc.open_stream("test.arrow")
# ... ArrowInvalid: Expected to read 1330795073 metadata bytes, but only read 486
Files written in the file format have a magic number on both sides of the data. The error message "Not an Arrow file" is thrown when that magic number is wrong. So we already detect this situation, we just need to be proactive about suggesting solutions / alternatives (e.g. "Not an Arrow file, perhaps this is in the streaming format?") so this should be very doable.
I am an arrow committer and got totally thrown off by the error message and thought my file was corrupt. So yes, your suggested error message sounds great.
@domoritz i would like to contribute in this project can you assign this project to me
I assigned it to you. Please send a pull request soon.
Hello @domoritz -- has this issue been fixed? If not, i can contribute!
Describe the bug, including details regarding any error messages, version, and platform.
I am trying to loan an arrow file
but get this error
The arrow file is https://github.com/uwdata/flights-arrow/blob/master/flights-200k.arrow and loads fine in the arrow js library.
Component(s)
Python