Open martindurant opened 9 months ago
You're welcome!
The returning of read-only numpy arrays is certainly one of the parts of the large CoW change (https://pandas.pydata.org/pdeps/0007-copy-on-write.html) we are least certain about. So feedback from downstream developers is certainly welcome.
I assume the issue here is because you allocate an empty dataframe first, and then get "view" arrays to write into. For the index, in one of the code paths that happens here:
The return value of .values
is now a read-only numpy array (https://pandas.pydata.org/docs/user_guide/copy_on_write.html#read-only-numpy-arrays). You know you just created this data yourself, so you can safely change its writeable
flag to True as a workaround.
And I suppose this only happens for the Index, because for columns you rely on the Block.values, where we didn't add this protection as this is regarded as internal anyway.
It's probably already covered by the failing tests you have in fastparquet's own test suite, but listing here some tests that are failing on the pandas side (they were being skipped with CoW enabled for some time, we should have reported that earlier):
# dataframe with a non-default (i.e. non-RangeIndex) index
df = pd.DataFrame({"A": [1, 2, 3]}, index=list("abc"))
df.to_parquet("test.parquet", engine="fastparquet")
pd.read_parquet("test.parquet", engine="fastparquet")
# probably same underlying issue; tz-aware datetime index
import datetime
idx = [datetime.datetime.now(datetime.timezone.utc)] * 5
df = pd.DataFrame(index=idx, data={"index_as_col": idx})
df.to_parquet("test.parquet", engine="fastparquet")
pd.read_parquet("test.parquet", engine="fastparquet")
Thanks for the info, @jorisvandenbossche . Any idea of the release timeline?
The current goal is April
No longer allows setting series values in-place. Thanks pandas.