Open asfimport opened 3 years ago
Joris Van den Bossche / @jorisvandenbossche:
copy=False
would probably have to throw an exception in some cases where we can't guarantee zero copy, like when building from a Python List
Or copy=False
could also not guarantee that no copy is made, but will only try to not make a copy if possible. That's basically the behaviour of the copy
keyword in numpy.array(..)
On the general issue, I agree that the current behaviour is not ideal and potentially being confusing/having surprising effects. But I also think it's not that easy to change. I think a lot of people rely on the zero-copy behaviour to avoid unnecessary copies (eg if you just convert to Arrow to then directly write that to Parquet file, then you don't want to make an additional copy).
Apache Arrow JIRA Bot: This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned per project policy. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon.
As a first step, would having copy=False
behaving like "try not to copy" as default behavior be relevant? Benefit is that current users relying on the non-copy behavior would not be affected, while users struggling with the confusing behavior could be enlightened by having a look at copy
parameter doc and use copy=True
if needed. Downside is that current behavior remains confusing.
When building an Arrow array from a numpy array it's very confusing from the user point of view that the result is not always a new array.
Under the hood Arrow sometimes reuses the memory if no casting is needed
and sometimes doesn't if a cast is involved
For non primite types instead it does always copy
This behaviour needs a lot of attention from the user and understanding of what's going on, which makes pyarrow hard to use.
A
copy=True/False
should be added topa.array
and the default value should probably becopy=True
so that by default you can always create an arrow array out of a numpy one (ascopy=False
would probably have to throw an exception in some cases where we can't guarantee zero copy, like when building from a Python List)Reporter: Alessandro Molina / @amol-
Related issues:
Note: This issue was originally created as ARROW-12666. Please see the migration documentation for further details.