Open asfimport opened 3 years ago
Antoine Pitrou / @pitrou:
Hi [~tsoernes]
, can you explain how you encountered this error? Can you post the file you're trying to read somewhere, and/or can you explain how it was generated?
Joris Van den Bossche / @jorisvandenbossche:
Hi [~tsoernes]
, can you provide some more information, and ideally a reproducible example?
(eg was the feather file written with the same version of pyarrow or with a previous one? And if a previous one, which version? Did you use compression? What kind of data does the file contain? Could you provide a small script that generates a feather file that reproduces the issue?)
EDIT: whoops, sorry for the duplicate comment asking for more information. JIRA isn't very good at refreshing / indicating there are newer comments if the tab was already open ...
Torstein Sørnes: @jorisvandenbossche @pitrou
The file is compressed with lz4. It is a Pandas dataframe. The code for writing it, is:
df.to_feather(path, compression='lz4')
where df is a pandas dataframe.
The file was written with the same version of pyarrow and pandas, as it is trying to being read.
I have written, and read, successfully, hundreds of pandas dataframe arrow files using exactly the same code, and library versions. I have no idea why this in particular, fails.
The file is too big to upload here. Does this link work?
https://ml-pull.s3.eu-central-1.amazonaws.com/Jobs/glassdoor/jobs1.feather.011
Cheers, and thanks for you work.
Antoine Pitrou / @pitrou: Thank you, I'll take a look.
Antoine Pitrou / @pitrou:
[~tsoernes]
It seems the file is invalid indeed. Have you tried recreating it from the exact same data?
pyarrow 4.0.0
Reporter: Torstein Sørnes
Note: This issue was originally created as ARROW-12733. Please see the migration documentation for further details.